Chapter 7 - Transcending Linear Models

Describe nonlinear relationships in a linear model framework

An explanatory variable: basis function (polynomial, ladder, polynomial spline); smooth spline; local regression

Polynomial regression --- ( ) is a high-order term

Ladder regression --- ( ) is about the step function

Polynomial spline regression ---
(
)=h(x,
)

Smooth spline regression ---min

Local regression---min

Multiple explanatory variables: generalized additive model GAM

7.1 polynomial regression

In general, d will not be greater than 3 or 4, otherwise too flexible and over-fitting.

The fit error is measured in terms of a range of two standard deviations (the range of the dashed line represents the 95% confidence interval for f^). Where C is the covariance matrix of the estimated coefficient ^, =(1, , , , )

7.2 step function

If you have an intercept, use one dummy variable.

Unless there are variables with clear demarcation points (such as qualitative variables: high school, university, master), the relationship in the same category will be lost.

7.3 basis function

7.4 spline regression

7.4.1 Block polynomial

Defect: discontinuous

7.4.2 Adding constraints to block polynomials becomes polynomial spline regression

Add constraint 1: continuous

Add constraint 2, 3: first order, second order continuous (guaranteed smooth)

Each degree of freedom is liberated by adding a constraint.

To ensure that the d-order polynomial smoothing requires d-1 order continuity.

In general, for a d-order polynomial spline regression with K knots, df = d + K + 1. In this example (d=3, K=1), the block polynomial has (3+1)*2 degrees of freedom, ensuring that each knot requires 1 degree of freedom continuously, ensuring that each knot is smooth and requires d-1 freedoms. Degree (d-1 order continuous), so df=(d+1)*(K+1)-K*(1+d-1)=d+K+1.

If two constraints are added to ensure linearity at the boundary (greater than the largest junction, less than the minimum junction), a more stable natural polynomial spline regression at the boundary is obtained.

7.4.3 Expression of spline regression

Each knot is represented by an h(x, ),

For polynomial spline regression with K=2 and d=3, a total of 6 coefficients need to be estimated.

y= + * + * + * + *h(x, )+ *h(x, )+

7.4.4 Selecting the number and location of knots

Position selection: Although adding a knot at the relationship mutation is the best method, it is not easy to operate. The usual practice is to first determine df, and then the software automatically adds a knot to the quantile.

Quantity selection: determine the number of knots, that is, determine df, generally determine the optimal df according to the cross-validation error corresponding to different df.

Comparison of 7.4.5 and polynomial regression

Polynomial regression is more flexible by adding higher-order terms, and natural polynomial spline regression is more flexible by adding knots, which are more stable.

7.5 smooth spline

7.5.1 Smooth spline overview

Min

Minimize two parts: RSS and penalty.

The first derivative represents the slope and the second derivative represents the change in slope. If the uneven g is selected for perfect fitting y (so that RSS is small), the penalty term will be larger (given ).

Represents the intensity of punishment.
When
infinite, g will be completely smooth, that is, a straight line, and g is the least squares y^.
Therefore, it
essentially represents the trade-off between bias and variance.

g(x) is actually a reduced version (by control) with a natural polynomial spline regression at each sample point.

7.5.2 Selecting smooth parameters

Determining the number of knots is equivalent to determining df, g(x) has n knots, ie there are n (nominal) dfs.

In the smooth spline regression, a limit is imposed on n parameters, so the measure of smoothness is the effective df, ie .

Is defined as: split g^ non-n=n matrix and y, trace of the matrix

There is no need to select the number and position of the bounds in the smooth spline regression, as long as the cross-validation is used to determine the optimality .

7.6 local regression

Min

Each point gives different weights to all other points. The point weight is close to the point, and the point weight outside the span is 0.

The role of span is equivalent to determining flexibility. The smaller the span, the more flexible it is, the more uneven it is.

Cross-validation is generally used to determine the optimal span.

7.7 Generalized Additive Model (GAM)

Each variable can take its own different form of function and add up at the end.

7.7.1 GAM for regression problems

7.7.2 GAM for classification problems