Creators of STATISTICA Knowledge Analysis Software and Solutions Generalized Additive Designs (GAM
Additive Designs Generalized Linear Versions Distributions and Hyperlink Features Generalized Additive Versions Estimating the Nonparametric Function of Predictors by means of Scatterplot Smoothers A specific Illustration: The Generalized Additive Logistic Design Fitting Generalized Additive Versions Interpreting the results Degrees of Flexibility A Term of Caution
The methods available in Generalized Additive Types are implementations of techniques developed and popularized by Hastie and Tibshirani (1990). A detailed description of these and related techniques, the algorithms used to fit these types, and discussions of recent research in this area of statistical modeling can also be found in Schimek (2000).
Additive Designs
The methods described in this section represent a generalization of multiple regression (which is a special case of general linear designs). Specifically, in linear regression, a linear least-squares fit is computed for a set of predictor or X variables, to predict a dependent Y variable. The well known linear regression equation with m predictors, to predict a dependent variable Y, can be stated as:
Y = b0 + b1*X1 + ... + bm*Xm
Where Y stands for the (predicted values of the) dependent variable, X1through Xm represent the m values for the predictor variables, and b0, and b1 through bm are the regression coefficients estimated by multiple regression. A generalization of the multiple regression model would be to maintain the additive nature of the model, but to replace the simple terms of the linear equation bi*Xi with fi(Xi) where fi is a non-parametric perform of the predictor Xi. }}--> }}-->In other words, instead of a single coefficient for each variable (additive term) in the model, in additive models an unspecified (non-parametric) function is estimated for each predictor, to achieve the best prediction of the dependent variable values.
Generalized Linear Versions
To summarize the basic idea, the generalized linear model differs from the general linear product (of which multiple regression is a special case) in two major respects: First, the distribution of the dependent or response variable can be (explicitly) non-normal, and does not have to be continuous, e.g., it can be binomial; second, the dependent variable values are predicted from a linear combination of predictor variables, which are "connected" to the dependent variable by means of a website link function. The general linear product for a single dependent variable can be considered a special case of the generalized linear design: In the general linear design the dependent variable values are expected to follow the normal distribution, and the website link purpose is a simple identity function (i.e., the linear combination of values for the predictor variables is not transformed). }}--> }}-->
To illustrate, in the general linear product a response variable Y is linearly associated with values on the X variables while the relationship in the generalized linear product is assumed to be
Y = g(b0 + b1*X1 + ... + bm*Xm)
where g(…) is a function. Formally, the inverse purpose of g(…), say gi(…), is called the hyperlink perform; so that:
gi(muY) = b0 + b1*X1 + ... + bm*Xm
where mu-Y stands for the expected value of Y.
Distributions and Website link Capabilities
Generalized Additive Types allows you to choose from a wide variety of distributions for the dependent variable, and website link functions for the effects of the predictor variables on the dependent variable (see McCullagh and Nelder, 1989; Hastie and Tibshirani, 1990; see also GLZ Introductory Overview - Computational Approach for a discussion of hyperlink functions and distributions):
Normal, Gamma, and Poisson distributions:
Log link: f(z) = log(z)
Inverse link: f(z) = 1/z
Identity website link: f(z) = z
Binomial distributions:
Logit hyperlink: f(z)=log(z/(1-z))
Generalized Additive Designs
We can combine the notion of additive designs with generalized linear models, to derive the notion of generalized additive types, as:
gi(muY) = Si(fi(Xi))
In other words, the purpose of generalized additive designs is to maximize the quality of prediction of a dependent variable Y from various distributions, by estimating unspecific (non-parametric) functions of the predictor variables which are "connected" to the dependent variable by way of a hyperlink function.
To index
Estimating the Nonparametric Function of Predictors through Scatterplot Smoothers
A unique aspect of generalized additive models are the non-parametric functions fi of the predictor variables Xi. Specifically, instead of some kind of simple or complex parametric features, Hastie and Tibshirani (1990) discuss various general scatterplot smoothers that can be applied to the X variable values, with the target criterion to maximize the quality of prediction of the (transformed) Y variable values. One such scatterplot smoother is the cubic smoothing splines smoother, which generally produces a smooth generalization of the relationship between the two variables in the scatterplot. }}--> }}-->Computational details regarding this smoother can be found in Hastie and Tibshirani (1990; see also Schimek, 2000). }}--> }}-->
To summarize, instead of estimating single parameters (like the regression weights in multiple regression), in generalized additive versions, we find a general unspecific (non-parametric) purpose that relates the predicted (transformed) Y values to the predictor values. }}--> }}-->
A particular Instance: The Generalized Additive Logistic Model
Let us consider a particular example of the generalized additive types: A generalization of the logistic (logit) design for binary dependent variable values. As also described in detail in the context of Nonlinear Estimation and Generalized Linear/Nonlinear Versions, the logistic regression product for binary responses can be written as follows:
y=exp(b0+b1*x1+...+bm*xm)/1+exp(b0+b1*x1+...+bm*xm)
Note that the distribution of the dependent variable is assumed to be binomial, i.e., the response variable can only assume the values 0 or 1 (e.g., in a market research study, the purchasing decision would be binomial: The customer either did or did not make a particular purchase). We can apply the logistic link perform to the probability p (ranging between 0 }}--> }}-->and 1) so that:
p' = log p/(1-p)
By applying the logistic website link function, we can now rewrite the product as:
p' = b0 + b1*X1 }}--> }}-->+ ... + bm*Xm
Finally, we substitute the simple single-parameter additive terms to derive the generalized additive logistic design:
p' = b0 + f1(X1) + ... + fm(Xm)
An instance application of the this model can be found in Hastie and Tibshirani (1990).
Fitting Generalized Additive Types
Detailed descriptions of how generalized additive versions are fit to knowledge can be found in Hastie and Tibshirani (1990), as well as Schimek (2000, p. 300). In general there are two separate iterative operations involved in the algorithm, which are usually labeled the outer and inner loop. The purpose of the outer loop is to maximize the overall fit of the model, by minimizing the overall likelihood of the knowledge given the product (similar to the maximum likelihood estimation procedures as described in,
Windows 7 64 Bit, for example, }}--> }}-->the context of Nonlinear Estimation). The purpose of the inner loop is to refine the scatterplot smoother, which is the cubic splines smoother. The smoothing is performed with respect to the partial residuals; i.e., for every predictor k, the weighted cubic spline fit is found that best represents the relationship between variable k and the (partial) residuals computed by removing the effect of all other j predictors (j ¹ k). The iterative estimation procedure will terminate, when the likelihood of the knowledge given the product can not be improved.
To index
Interpreting the outcomes
Many of the standard results statistics computed by Generalized Additive Models are similar to those customarily reported by linear or nonlinear model fitting procedures. For example, predicted and residual values for the final model can be computed, and various graphs of the residuals can be displayed to help the user identify possible outliers, etc. Refer also to the description of the residual statistics computed by Generalized Linear/Nonlinear Models for details.
The main result of interest, of course, is how the predictors are related to the dependent variable. Scatterplots can be computed showing the smoothed predictor variable values plotted against the partial residuals, i.e., the residuals after removing the effect of all other predictor variables. }}--> }}-->
This plot allows you to evaluate the nature of the relationship between the predictor with the residualized (adjusted) dependent variable values (see Hastie & Tibshirani, 1990; in particular formula 6.3), and hence the nature of the influence of the respective predictor in the overall design. }}--> }}-->
Degrees of Flexibility
To reiterate, the generalized additive versions approach replaces the simple products of (estimated) parameter values times the predictor values with a cubic spline smoother for each predictor. When estimating a single parameter value, we lose one degree of flexibility, i.e., we add one degree of independence to the overall design. It is not clear how many degrees of flexibility are lost due to estimating the cubic spline smoother for each variable. Intuitively, a smoother can either be very smooth, not following the pattern of data in the scatterplot very closely, or it can be less smooth, following the pattern of the information more closely. In the most extreme case, a simple line would be very smooth, and require us to estimate a single slope parameter, i.e., we would use one degree of flexibility to fit the smoother (simple straight line); on the other hand, we could force a very "non-smooth" line to connect each actual knowledge point, in which case we could "use-up" approximately as many degrees of freedom as there are points in the plot. Generalized Additive Designs allows you to specify the degrees of independence for the cubic spline smoother; the fewer degrees of independence you specify, the smoother is the cubic spline fit to the partial residuals, and typically, the worse is the overall fit of the product. The issue of degrees of freedom for smoothers is discussed in detail in Hastie and Tibshirani (1990).
A word of Caution
Generalized additive models are very flexible, and can provide an excellent fit in the presence of nonlinear relationships and significant noise in the predictor variables. However, note that because of this flexibility, you must be extra cautious not to over-fit the knowledge, i.e., apply an overly complex design (with many degrees of independence) to info so as to produce a good fit that likely will not replicate in subsequent validation studies. Also, compare the quality of the fit obtained from Generalized Additive Designs to the fit obtained by way of Generalized Linear/Nonlinear Designs. In other words, evaluate whether the added complexity (generality) of generalized additive models (regression smoothers) is necessary in order to obtain a satisfactory fit to the info. Often, this is not the case, and given a comparable fit of the designs, the simpler generalized linear model is preferable to the more complex generalized additive product. These issues are discussed in greater detail in Hastie and Tibshirani (1990).
Another issue to keep in mind pertains to the interpretability of benefits obtained from (generalized) linear designs vs. generalized additive designs. Linear types are easily understood, summarized, and communicated to others (e.g., in technical reports). Moreover, parameter estimates can be used to predict or classify new cases in a simple and straightforward manner. Generalized additive types are not easily interpreted, in particular when they involve complex nonlinear effects of some or all of the predictor variables (and, of course, it is in those instances where generalized additive types may yield a better fit than generalized linear versions). To reiterate, it is usually preferable to rely on a simple well understood product for predicting future cases, than on a complex product that is difficult to interpret and summarize.
To index