Mixed model

Not to be confused with mixture model.

Regression analysis
Part of a series on Statistics

Models
Linear regression Simple regression Ordinary least squares Polynomial regression General linear model
Generalized linear model Discrete choice Logistic regression Multinomial logit Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Mixed model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Ordinary least squares Linear (math) Partial Total Generalized Weighted Non-linear Non-negative Iteratively reweighted Ridge regression
Least absolute deviations Bayesian Bayesian multivariate
Background
Regression model validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Statistics portal

A mixed model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units (longitudinal study), or where measurements are made on clusters of related statistical units. Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures ANOVA.

History and current status

Ronald Fisher introduced random effects models to study the correlations of trait values between relatives.^[1] In the 1950s, Charles Roy Henderson provided best linear unbiased estimates (BLUE) of fixed effects and best linear unbiased predictions (BLUP) of random effects.^[2]^[3]^[4]^[5] Subsequently, mixed modeling has become a major area of statistical research, including work on computation of maximum likelihood estimates, non-linear mixed effect models, missing data in mixed effects models, and Bayesian estimation of mixed effects models. Mixed models are applied in many disciplines where multiple correlated measurements are made on each unit of interest. They are prominently used in research involving human and animal subjects in fields ranging from genetics to marketing, and have also been used in baseball^[6] and industrial statistics.^[7]

Definition

In matrix notation a mixed model can be represented as

\boldsymbol{y} = X \boldsymbol{\beta} + Z \boldsymbol{u} + \boldsymbol{\epsilon}

where

$\boldsymbol{y}$ is a known vector of observations, with mean $E(\boldsymbol{y}) = X \boldsymbol{\beta}$ ;
${\boldsymbol {\beta }}$ is an unknown vector of fixed effects;
$\boldsymbol{u}$ is an unknown vector of random effects, with mean $E(\boldsymbol{u})=\boldsymbol{0}$ and variance-covariance matrix $\operatorname{var}(\boldsymbol{u})=G$ ;
$\boldsymbol{\epsilon}$ is an unknown vector of random errors, with mean $E(\boldsymbol{\epsilon})=\boldsymbol{0}$ and variance $\operatorname{var}(\boldsymbol{\epsilon})=R$ ;
$X$ and $Z$ are known design matrices relating the observations $\boldsymbol{y}$ to ${\boldsymbol {\beta }}$ and $\boldsymbol{u}$ , respectively.

Estimation

The joint density of $\boldsymbol{y}$ and $\boldsymbol{u}$ can be written as: $f(\boldsymbol{y},\boldsymbol{u}) = f(\boldsymbol{y} | \boldsymbol{u}) \, f(\boldsymbol{u})$ . Assuming normality, $\boldsymbol{u} \sim \mathcal{N}(\boldsymbol{0},G)$ , $\boldsymbol{\epsilon} \sim \mathcal{N}(\boldsymbol{0},R)$ and $Cov(\boldsymbol{u},\boldsymbol{\epsilon})=\boldsymbol{0}$ , and maximizing the joint density for ${\boldsymbol {\beta }}$ and $\boldsymbol{u}$ , gives Henderson's "mixed model equations" (MME):^[2]^[4]^[8]

\begin{pmatrix} X'R^{-1}X & X'R^{-1}Z \\ Z'R^{-1}X & Z'R^{-1}Z + G^{-1} \end{pmatrix} \begin{pmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{u}} \end{pmatrix} = \begin{pmatrix} X'R^{-1}\boldsymbol{y} \\ Z'R^{-1}\boldsymbol{y} \end{pmatrix}

The solutions to the MME, $\textstyle\hat{\boldsymbol{\beta}}$ and $\textstyle\hat{\boldsymbol{u}}$ are best linear unbiased estimates (BLUE) and predictors (BLUP) for ${\boldsymbol {\beta }}$ and $\boldsymbol{u}$ , respectively. This is a consequence of the Gauss-Markov theorem when the conditional variance of the outcome is not scalable to the identity matrix. When the conditional variance is known, then the inverse variance weighted least squares estimate is BLUE. However, the conditional variance is rarely, if ever, known. So it is desirable to jointly estimate the variance and weighted parameter estimates when solving MMEs.

One method used to fit such mixed models is that of the EM algorithm^[9] where the variance components are treated as unobserved nuisance parameters in the joint likelihood. Currently, this is the implemented method for the major statistical software packages R (lme in the nlme library), statsmodels and SAS (proc mixed). The solution to the mixed model equations is a maximum likelihood estimate when the distribution of the errors is normal.^[10]^[11]

References

↑ Fisher, RA (1918). "The correlation between relatives on the supposition of Mendelian inheritance". Transactions of the Royal Society of Edinburgh. 52 (2): 399–433. doi:10.1017/S0080456800012163.
1 2 Robinson, G.K. (1991). "That BLUP is a Good Thing: The Estimation of Random Effects". Statistical Science. 6 (1): 15–32. doi:10.1214/ss/1177011926. JSTOR 2245695.
↑ C. R. Henderson, Oscar Kempthorne, S. R. Searle and C. M. von Krosigk (1959). "The Estimation of Environmental and Genetic Trends from Records Subject to Culling". Biometrics. International Biometric Society. 15 (2): 192–218. doi:10.2307/2527669. JSTOR 2527669.
1 2 L. Dale Van Vleck. "Charles Roy Henderson, April 1, 1911 – March 14, 1989" (PDF). United States National Academy of Sciences.
↑ McLean, Robert A.; Sanders, William L.; Stroup, Walter W. (1991). "A Unified Approach to Mixed Linear Models". The American Statistician. American Statistical Association. 45 (1): 54–64. doi:10.2307/2685241. JSTOR 2685241.
↑ analytics guru and mixed model
↑ Mixed models in industry
↑ Henderson, C R (1973). "Sire evaluation and genetic trends" (PDF). Journal of Animal Science. American Society of Animal Science. 1973: 10–41. Retrieved 17 August 2014.
↑ Lindstrom, ML; Bates, DM (1988). "Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data". JASA. 83 (404): 1014–1021. doi:10.1080/01621459.1988.10478693.
↑ Laird, Nan M.; Ware, James H. (1982). "Random-Effects Models for Longitudinal Data". Biometrics. International Biometric Society. 38 (4): 963–974. doi:10.2307/2529876. JSTOR 2529876. PMID 7168798.
↑ Garrett M. Fitzmaurice, Nan M. Laird, and James H. Ware, 2004. Applied Longitudinal Analysis. John Wiley & Sons, Inc., p. 326-328.

Commercial

NCSS (statistical software) includes longitudinal mixed models analysis.
Stata statistical software includes multilevel mixed-effects models analysis.
GenStat statistical software includes flexible multilevel mixed-effects models analysis.
ASReml-R statistical R package to allow user model complex data.

This article is issued from Wikipedia - version of the 12/2/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.