Creators of STATISTICA Knowledge Evaluation Software program and Companies Generalized Additive Versions (GAM
Additive Designs Generalized Linear Types Distributions and Hyperlink Capabilities Generalized Additive Models Estimating the Nonparametric Operate of Predictors via Scatterplot Smoothers A specific Example: The Generalized Additive Logistic Design Fitting Generalized Additive Designs Interpreting the results Degrees of Independence A Word of Caution
The methods available in Generalized Additive Models are implementations of techniques developed and popularized by Hastie and Tibshirani (1990). A detailed description of these and related techniques, the algorithms used to fit these designs, and discussions of recent research in this area of statistical modeling can also be found in Schimek (2000).
Additive Versions
The methods described in this section represent a generalization of multiple regression (which is a special case of general linear designs). Specifically, in linear regression, a linear least-squares fit is computed for a set of predictor or X variables, to predict a dependent Y variable. The well known linear regression equation with m predictors, to predict a dependent variable Y, can be stated as:
Y = b0 + b1*X1 + ... + bm*Xm
Where Y stands for the (predicted values of the) dependent variable, X1through Xm represent the m values for the predictor variables, and b0, and b1 through bm are the regression coefficients estimated by multiple regression. A generalization of the multiple regression model would be to maintain the additive nature of the product, but to replace the simple terms of the linear equation bi*Xi with fi(Xi) where fi is a non-parametric purpose of the predictor Xi. }}--> }}-->In other words, instead of a single coefficient for each variable (additive term) in the design, in additive types an unspecified (non-parametric) operate is estimated for each predictor, to achieve the best prediction of the dependent variable values.
Generalized Linear Versions
To summarize the basic idea, the generalized linear model differs from the general linear product (of which multiple regression is a special case) in two major respects: First, the distribution of the dependent or response variable can be (explicitly) non-normal, and does not have to be continuous, e.g., it can be binomial; second, the dependent variable values are predicted from a linear combination of predictor variables, which are "connected" to the dependent variable through a hyperlink purpose. The general linear model for a single dependent variable can be considered a special case of the generalized linear product: In the general linear design the dependent variable values are expected to follow the normal distribution, and the link purpose is a simple identity function (i.e., the linear combination of values for the predictor variables is not transformed). }}--> }}-->
To illustrate, in the general linear design a response variable Y is linearly associated with values on the X variables while the relationship in the generalized linear model is assumed to be
Y = g(b0 + b1*X1 + ... + bm*Xm)
where g(…) is a operate. Formally, the inverse function of g(…), say gi(…), is called the link perform; so that:
gi(muY) = b0 + b1*X1 + ... + bm*Xm
where mu-Y stands for the expected value of Y.
Distributions and Hyperlink Capabilities
Generalized Additive Versions allows you to choose from a wide variety of distributions for the dependent variable, and link capabilities for the effects of the predictor variables on the dependent variable (see McCullagh and Nelder, 1989; Hastie and Tibshirani, 1990; see also GLZ Introductory Overview - Computational Approach for a discussion of hyperlink functions and distributions):
Normal, Gamma, and Poisson distributions:
Log website link: f(z) = log(z)
Inverse link: f(z) = 1/z
Identity link: f(z) = z
Binomial distributions:
Logit website link: f(z)=log(z/(1-z))
Generalized Additive Versions
We can combine the notion of additive models with generalized linear versions, to derive the notion of generalized additive types, as:
gi(muY) = Si(fi(Xi))
In other words, the purpose of generalized additive types is to maximize the quality of prediction of a dependent variable Y from various distributions, by estimating unspecific (non-parametric) functions of the predictor variables which are "connected" to the dependent variable via a link operate.
To index
Estimating the Nonparametric Operate of Predictors by means of Scatterplot Smoothers
A unique aspect of generalized additive types are the non-parametric features fi of the predictor variables Xi. Specifically, instead of some kind of simple or complex parametric functions, Hastie and Tibshirani (1990) discuss various general scatterplot smoothers that can be applied to the X variable values, with the target criterion to maximize the quality of prediction of the (transformed) Y variable values. One such scatterplot smoother is the cubic smoothing splines smoother, which generally produces a smooth generalization of the relationship between the two variables in the scatterplot. }}--> }}-->Computational details regarding this smoother can be found in Hastie and Tibshirani (1990; see also Schimek, 2000). }}--> }}-->
To summarize, instead of estimating single parameters (like the regression weights in multiple regression), in generalized additive designs, we find a general unspecific (non-parametric) perform that relates the predicted (transformed) Y values to the predictor values. }}--> }}-->
A particular Instance: The Generalized Additive Logistic Model
Let us consider a particular illustration of the generalized additive designs: A generalization of the logistic (logit) model for binary dependent variable values. As also described in detail in the context of Nonlinear Estimation and Generalized Linear/Nonlinear Designs, the logistic regression model for binary responses can be written as follows:
Note that the distribution of the dependent variable is assumed to be binomial, i.e., the response variable can only assume the values 0 or 1 (e.g., in a market research study, the purchasing decision would be binomial: The customer either did or did not make a particular purchase). We can apply the logistic link operate to the probability p (ranging between 0 }}--> }}-->and 1) so that:
p' = log p/(1-p)
By applying the logistic website link function, we can now rewrite the design as:
p' = b0 + b1*X1 }}--> }}-->+ ... + bm*Xm
Finally, we substitute the simple single-parameter additive terms to derive the generalized additive logistic design:
p' = b0 + f1(X1) + ... + fm(Xm)
An illustration application of the this design can be found in Hastie and Tibshirani (1990).
Fitting Generalized Additive Types
Detailed descriptions of how generalized additive designs are fit to data can be found in Hastie and Tibshirani (1990), as well as Schimek (2000, p. 300). In general there are two separate iterative operations involved in the algorithm,
Office 2010 Activation, which are usually labeled the outer and inner loop. The purpose of the outer loop is to maximize the overall fit of the design, by minimizing the overall likelihood of the data given the design (similar to the maximum likelihood estimation procedures as described in, for example, }}--> }}-->the context of Nonlinear Estimation). The purpose of the inner loop is to refine the scatterplot smoother, which is the cubic splines smoother. The smoothing is performed with respect to the partial residuals; i.e., for every predictor k, the weighted cubic spline fit is found that best represents the relationship between variable k and the (partial) residuals computed by removing the effect of all other j predictors (j ¹ k). The iterative estimation procedure will terminate, when the likelihood of the knowledge given the product can not be improved.
To index
Interpreting the outcomes
Many of the standard results statistics computed by Generalized Additive Models are similar to those customarily reported by linear or nonlinear model fitting procedures. For instance, predicted and residual values for the final design can be computed, and various graphs of the residuals can be displayed to help the user identify possible outliers, etc. Refer also to the description of the residual statistics computed by Generalized Linear/Nonlinear Designs for details.
The main result of interest, of course, is how the predictors are related to the dependent variable. Scatterplots can be computed showing the smoothed predictor variable values plotted against the partial residuals, i.e., the residuals after removing the effect of all other predictor variables. }}--> }}-->
This plot allows you to evaluate the nature of the relationship between the predictor with the residualized (adjusted) dependent variable values (see Hastie & Tibshirani, 1990; in particular formula 6.3), and hence the nature of the influence of the respective predictor in the overall product. }}--> }}-->
Degrees of Independence
To reiterate, the generalized additive designs approach replaces the simple products of (estimated) parameter values times the predictor values with a cubic spline smoother for each predictor. When estimating a single parameter value, we lose one degree of flexibility, i.e., we add one degree of independence to the overall design. It is not clear how many degrees of independence are lost due to estimating the cubic spline smoother for each variable. Intuitively, a smoother can either be very smooth, not following the pattern of information in the scatterplot very closely, or it can be less smooth, following the pattern of the information more closely. In the most extreme case, a simple line would be very smooth, and require us to estimate a single slope parameter, i.e., we would use one degree of flexibility to fit the smoother (simple straight line); on the other hand, we could force a very "non-smooth" line to connect each actual info point, in which case we could "use-up" approximately as many degrees of freedom as there are points in the plot. Generalized Additive Designs allows you to specify the degrees of freedom for the cubic spline smoother; the fewer degrees of flexibility you specify, the smoother is the cubic spline fit to the partial residuals, and typically, the worse is the overall fit of the product. The issue of degrees of flexibility for smoothers is discussed in detail in Hastie and Tibshirani (1990).
A phrase of Caution
Generalized additive versions are very flexible, and can provide an excellent fit in the presence of nonlinear relationships and significant noise in the predictor variables. However, note that because of this flexibility, you must be extra cautious not to over-fit the info, i.e., apply an overly complex product (with many degrees of independence) to information so as to produce a good fit that likely will not replicate in subsequent validation studies. Also, compare the quality of the fit obtained from Generalized Additive Designs to the fit obtained by way of Generalized Linear/Nonlinear Types. In other words, evaluate whether the added complexity (generality) of generalized additive designs (regression smoothers) is necessary in order to obtain a satisfactory fit to the info. Often, this is not the case, and given a comparable fit of the versions, the simpler generalized linear model is preferable to the more complex generalized additive design. These issues are discussed in greater detail in Hastie and Tibshirani (1990).
Another issue to keep in mind pertains to the interpretability of outcomes obtained from (generalized) linear models vs. generalized additive versions. Linear versions are easily understood, summarized, and communicated to others (e.g., in technical reports). Moreover, parameter estimates can be used to predict or classify new cases in a simple and straightforward manner. Generalized additive designs are not easily interpreted, in particular when they involve complex nonlinear effects of some or all of the predictor variables (and, of course, it is in those instances where generalized additive versions may yield a better fit than generalized linear models). To reiterate, it is usually preferable to rely on a simple well understood model for predicting future cases, than on a complex model that is difficult to interpret and summarize.
To index