Smoothing spline

For a broader coverage related to this topic, see Spline (mathematics).

The smoothing spline is a method of fitting a smooth curve to a set of noisy observations using a spline function.

Definition

Let be a sequence of observations, modeled by the relation . The smoothing spline estimate of the function is defined to be the minimizer (over the class of twice differentiable functions) of[1]

Remarks:

Derivation of the smoothing spline

It is useful to think of fitting a smoothing spline in two steps:

  1. First, derive the values .
  2. From these values, derive for all x.

Now, treat the second step first.

Given the vector of fitted values, the sum-of-squares part of the spline criterion is fixed. It remains only to minimize , and the minimizer is a natural cubic spline that interpolates the points . This interpolating spline is a linear operator, and can be written in the form

where are a set of spline basis functions. As a result, the roughness penalty has the form

where the elements of A are . The basis functions, and hence the matrix A, depend on the configuration of the predictor variables , but not on the responses or .

Now back to the first step. The penalized sum-of-squares can be written as

where . Minimizing over gives

De Boor's approach

De Boor's approach exploits the same idea, of finding a balance between having a smooth curve and being close to the given data.[2]

where is a parameter called smooth factor and belongs to the interval , and are the quantities controlling the extent of smoothing (they represent the weight of each point ). In practice, since cubic splines are mostly used, is usually . The solution for was proposed by Reinsch in 1967.[3] For , when approaches , converges to the "natural" spline interpolant to the given data.[2] As approaches , converges to a straight line (the smoothest curve). Since finding a suitable value of is a task of trial and error, a redundant constant was introduced for convenience.[3] is used to numerically determine the value of so that the function meets the following condition:

The algorithm described by de Boor starts with and increases until the condition is met.[2] If is an estimation of the standard deviation for , the constant is recommended to be chosen in the interval . Having means the solution is the "natural" spline interpolant.[3] Increasing means we obtain a smoother curve by getting farther from the given data.

Creating a multidimensional spline

Given the constraint from the definition formula we can conclude that the algorithm doesn't work for all sets of data. If we plan to use this algorithm for random points in a multidimensional space, to find a solution we need to give, as input to the algorithm, sets of data where these constraints are met. A solution for this is to introduce a parameter so that the input data would be represented as single-valued functions depending on that parameter; after this the smoothing will be performed for each function. In a bidimensional space a solution would be to parametrize and so that they would become and where . A convenient solution for is the cumulating distance where .[4][5]

A more detailed analysis on parametrization is done by E.T.Y. Lee.[6]

Related methods

See also: Curve fitting

Smoothing splines are related to, but distinct from:

Source code

Source code for spline smoothing can be found in the examples from Carl de Boor's book A Practical Guide to Splines. The examples are in the Fortran programming language. The updated sources are available also on Carl de Boor's official site .

References

  1. Hastie, T. J.; Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall. ISBN 0-412-34390-8.
  2. 1 2 3 De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). Springer. pp. 207–214. ISBN 0-387-90356-9.
  3. 1 2 3 Reinsch, Christian H. "Smoothing by Spline Functions". Retrieved 18 June 2016.
  4. Robert E. Smith Jr., Joseph M Price and Lona M. Howser. "A Smoothing Algorithm Using Cubic Spline Functions" (PDF). Retrieved 31 May 2011.
  5. N. Y. Graham. "Smoothing With Periodic Cubic Splines" (PDF). Retrieved 31 May 2011.
  6. E.T.Y. Lee. "Choosing nodes in parametric curve interpolation" (PDF). Retrieved 28 June 2011.
  7. Ruppert, David; Wand, M. P.; Carroll, R. J. (2003). Semiparametric Regression. Cambridge University Press. ISBN 0-521-78050-0.

Further reading

This article is issued from Wikipedia - version of the 11/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.