Projection matrix

In statistics, the projection matrix $\mathbf {P}$ ,^[1] sometimes also called the influence matrix^[2] or hat matrix $\mathbf{H}$ , maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values). It describes the influence each response value has on each fitted value.^[3]^[4] The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.

Overview

If the vector of response values is denoted by $\mathbf {y}$ and the vector of fitted values by $\mathbf {\hat {y}}$ ,

\mathbf {\hat {y}} =\mathbf {P} \mathbf {y} .

As $\mathbf {\hat {y}}$ is usually pronounced "y-hat", the projection matrix is also named hat matrix as it "puts a hat on $\mathbf {y}$ ". The formula for the vector of residuals $\mathbf {u}$ can also be expressed compactly using the projection matrix:

\mathbf {u} =\mathbf {y} -\mathbf {\hat {y}} =\mathbf {y} -\mathbf {P} \mathbf {y} =\left(\mathbf {I} -\mathbf {P} \right)\mathbf {y} .

where $\mathbf {I}$ is the identity matrix. The matrix $\mathbf {M} \equiv \left(\mathbf {I} -\mathbf {P} \right)$ is sometimes referred to as annihilator. Moreover, the element in the i^th row and j^th column of $\mathbf {P}$ is equal to the covariance between the j^th response value and the i^th fitted value, divided by the variance of the former:

{\begin{aligned}p_{ij}=\operatorname {Cov} \left[{\hat {y}}_{i},y_{j}\right]/\operatorname {Var} \left[y_{j}\right]\end{aligned}}

Therefore, the covariance matrix of the residuals, by error propagation, equals $\left(\mathbf {I} -\mathbf {P} \right)^{\mathsf {T}}\mathbf {\Sigma } \left(\mathbf {I} -\mathbf {P} \right)$ , where $\mathbf{\Sigma}$ is the covariance matrix of the error vector (and by extension, the response vector as well). For the case of linear models with independent and identically distributed errors in which $\mathbf {\Sigma } =\sigma ^{2}\mathbf {I}$ , this reduces to $\left(\mathbf {I} -\mathbf {P} \right)\sigma ^{2}$ .^[3]

Linear model

Suppose that we wish to estimate a linear model using linear least squares. The model can be written as

\mathbf {y} =\mathbf {X} {\boldsymbol {\beta }}+{\boldsymbol {\varepsilon }},

where X is a matrix of explanatory variables (the design matrix), β is a vector of unknown parameters to be estimated, and ε is the error vector.

Many types of models and techniques are subject to this formulation. A few examples are linear least squares, smoothing splines, regression splines, local regression, kernel regression, and linear filtering

Solution with unit weights and uncorrelated errors

When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are

{\hat {\boldsymbol {\beta }}}=\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} ,

so the fitted values are

{\hat {\mathbf {y} }}=\mathbf {X} {\hat {\boldsymbol {\beta }}}=\mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {y} .

Therefore the projection matrix (and hat matrix) is given by

\mathbf {P} \equiv \mathbf {X} \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}.

Non-identical weights and/or correlated errors

The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. Suppose that the covariance matrix of the errors is Σ. Then since

{\hat {\boldsymbol {\beta }}}=\left(X^{\top }\Sigma ^{-1}X\right)^{-1}X^{\top }\Sigma ^{-1}\,\mathbf {y} ,

the hat matrix is thus

H=X\left(X^{\top }\Sigma ^{-1}X\right)^{-1}X^{\top }\Sigma ^{-1},\,

and again it may be seen that H² = H, though now it is no longer symmetric.

Properties

The projection matrix has a number of useful algebraic properties.^[5]^[6] In the language of linear algebra, the projection matrix is the orthogonal projection onto the column space of the design matrix $\mathbf {X}$ .^[4](Note that $\left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}$ is the pseudoinverse of X.) Some facts of the projection matrix in this setting are summarized as follows:^[4]

$\mathbf {u} =(\mathbf {I} -\mathbf {P} )\mathbf {y} ,$ and $\mathbf {u} =\mathbf {y} -\mathbf {P} \mathbf {y} \perp \mathbf {X} .$
$\mathbf {P}$ is symmetric, and so is $\mathbf {M} \equiv \left(\mathbf {I} -\mathbf {P} \right)$ .
$\mathbf {P}$ is idempotent: $\mathbf {P} ^{2}=\mathbf {P}$ , and so is $\mathbf {M}$ .
If $\mathbf {X}$ is an (n × r) matrix with $\operatorname {rank} (\mathbf {X} )=r$ , then $\operatorname {rank} (\mathbf {P} )=r$
The eigenvalues of $\mathbf {P}$ consist of r ones and n−r zeros, while the eigenvalues of $\mathbf {M}$ consist of n−r ones and r zeros.^[7]
$\mathbf {X}$ is invariant under $\mathbf {P}$ : $\mathbf {PX} =\mathbf {X} ,$ hence $\left(\mathbf {I} -\mathbf {P} \right)\mathbf {X} =\mathbf {0}$ .
$\left(\mathbf {I} -\mathbf {P} \right)\mathbf {P} =\mathbf {P} \left(\mathbf {I} -\mathbf {P} \right)=\mathbf {0} .$
$\mathbf {P}$ is unique for certain subspace.

The projection matrix corresponding to a linear model is symmetric and idempotent, that is, $\mathbf {P} ^{2}=\mathbf {P}$ . However, this is not always the case; in locally weighted scatterplot smoothing (LOESS), for example, the hat matrix is in general neither symmetric nor idempotent.

For linear models, the trace of the projection matrix is equal to the rank of $\mathbf {X}$ , which is the number of independent parameters of the linear model. For other models such as LOESS that are still linear in the observations $\mathbf {y}$ , the projection matrix can be used to define the effective degrees of freedom of the model.

Practical applications of the projection matrix in regression analysis include leverage and Cook's distance, which are concerned with identifying influential observations, i.e. observations which have a large effect on the results of a regression.

Blockwise formula

Suppose the design matrix $X$ can be decomposed by columns as $X=[A,B]$ . Define the hat or projection operator as $P\{X\}=X\left(X^{\top }X\right)^{-1}X^{\top }$ . Similarly, define the residual operator as $M\{X\}=I-P\{X\}$ . Then the projection matrix can be decomposed as follows:^[8]

P\{X\}=P\{A\}+P\{M\{A\}B\},

where, e.g., $P\{A\}=A\left(A^{\top }A\right)^{-1}A^{\top }$ and $M\{A\}=I-P\{A\}$ . There are a number of applications of such a partitioning. The classical application has $A$ a column of all ones, which allows one to analyze the effects of adding an intercept term to a regression. Another use is in the fixed effects model, where $A$ is a large sparse matrix of the dummy variables for the fixed effect terms. One can use this partition to compute the hat matrix of $X$ without explicitly forming the matrix $X$ , which might be too large to fit into computer memory.

References

↑ Basilevsky, Alexander (2005). Applied Matrix Algebra in the Statistical Sciences. Dover. pp. 160–176. ISBN 0-486-44538-0.
↑ "Data Assimilation: Observation inﬂuence diagnostic of a data assimilation system" (PDF).
1 2 Hoaglin, David C.; Welsch, Roy E. (February 1978). "The Hat Matrix in Regression and ANOVA". The American Statistician. 32 (1): 17–22. doi:10.2307/2683469. JSTOR 2683469.
1 2 3 David A. Freedman (2009). Statistical Models: Theory and Practice. Cambridge University Press.
↑ Gans, P. (1992). Data Fitting in the Chemical Sciences. Wiley. ISBN 0-471-93412-7.
↑ Draper, N. R.; Smith, H. (1998). Applied Regression Analysis. Wiley. ISBN 0-471-17082-8.
↑ Amemiya, Takeshi (1985). Advanced Econometrics. Cambridge: Harvard University Press. pp. 460–461. ISBN 0-674-00560-0.
↑ Rao, C. Radhakrishna; Toutenburg, Helge; Shalabh; Heumann, Christian (2008). Linear Models and Generalizations (3rd ed.). Berlin: Springer. p. 323. ISBN 978-3-540-74226-5.

Matrix classes

Explicitly constrained entries	(0,1) Alternant Anti-diagonal Anti-Hermitian Anti-symmetric Arrowhead Band Bidiagonal Binary Bisymmetric Block-diagonal Block Block tridiagonal Boolean Cauchy Centrosymmetric Conference Complex Hadamard Copositive Diagonally dominant Diagonal Discrete Fourier Transform Elementary Equivalent Frobenius Generalized permutation Hadamard Hankel Hermitian Hessenberg Hollow Integer Logical Markov Metzler Monomial Moore Nonnegative Partitioned Parisi Pentadiagonal Permutation Persymmetric Polynomial Positive Quaternionic Sign Signature Skew-Hermitian Skew-symmetric Skyline Sparse Sylvester Symmetric Toeplitz Triangular Tridiagonal Unitary Vandermonde Walsh Z

Constant	Exchange Hilbert Identity Lehmer Of ones Pascal Pauli Redheffer Shift Zero

Conditions on eigenvalues or eigenvectors	Companion Convergent Defective Diagonalizable Hurwitz Positive-definite Stability Stieltjes

Satisfying conditions on products or inverses	Congruent Idempotent or Projection Invertible Involutory Nilpotent Normal Orthogonal Orthonormal Singular Unimodular Unipotent Totally unimodular Weighing

With specific applications	Adjugate Alternating sign Augmented Bézout Carleman Cartan Circulant Cofactor Commutation Coxeter Derogatory Distance Duplication Elimination Euclidean distance Fundamental (linear differential equation) Generator Gramian Hessian Householder Jacobian Moment Payoff Pick Random Rotation Seifert Shear Similarity Symplectic Totally positive Transformation Wedderburn X–Y–Z

Used in statistics	Bernoulli Centering Correlation Covariance Dispersion Doubly stochastic Fisher information Hat Precision Stochastic Transition

Used in graph theory	Adjacency Biadjacency Degree Edmonds Incidence Laplacian Seidel adjacency Skew-adjacency Tutte

Used in science and engineering	Cabibbo–Kobayashi–Maskawa Density Fundamental (computer vision) Fuzzy associative Gamma Gell-Mann Hamiltonian Irregular Overlap S State transition Substitution Z (chemistry)

Related terms	Jordan canonical form Linear independence Matrix exponential Matrix representation of conic sections Perfect matrix Pseudoinverse Quaternionic matrix Row echelon form Wronskian

List of matrices Category:Matrices

This article is issued from Wikipedia - version of the 11/16/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.