Contrast (statistics)

In statistics, particularly in analysis of variance and linear regression, a contrast is a linear combination of variables (parameters or statistics) whose coefficients add up to zero, allowing comparison of different treatments.^[1]^[2]

Definitions

Let $\theta _{1},\ldots ,\theta _{t}$ be a set of variables, either parameters or statistics, and $a_{1},\ldots ,a_{t}$ be known constants. The quantity $\sum _{i=1}^{t}a_{i}\theta _{i}$ is a linear combination. It is called a contrast if $\sum _{i=1}^{t}a_{i}=0$ .^[3]^[4] Furthermore, two contrasts, $\sum _{i=1}^{t}a_{i}\theta _{i}$ and $\sum _{i=1}^{t}b_{i}\theta _{i}$ , are orthogonal if $\sum _{i=1}^{t}a_{i}b_{i}=0$ .^[5]

Examples

Let us imagine that we are comparing four means, $\mu _{1},\mu _{2},\mu _{3},\mu _{4}$ . The following table describes three possible contrasts:

$\mu _{1}$	$\mu _{2}$	$\mu _{3}$	$\mu _{4}$
1	-1	0	0
0	0	1	-1
1	1	-1	-1

The first contrast allows to compare the first mean with the second, the second contrast allows to compare the third mean with the fourth, and the third contrast allows to compare the average of the first two means with the average of the last two.^[4]

In a balanced one-way analysis of variance, using orthogonal contrasts has the advantage of completely partitioning the treatment sum of squares into non-overlapping additive components that represent the variation due to each contrast.^[6] Consider the numbers above: each of the rows sums up to zero (hence they are contrasts). If we multiply each element of the first and second row and add those up, this again results in zero, thus the first and second contrast are orthogonal and so on.

Sets of contrast

Orthogonal contrasts are a set of contrasts in which, for any distinct pair, the sum of the cross-products of the coefficients is zero (assume sample sizes are equal).^[7] Although there are potentially infinite sets of orthogonal contrasts, within any given set there will always be a maximum of exactly k – 1 possible orthogonal contrasts (where k is the number of group means available).^[8]
Polynomial contrasts are a special set of orthogonal contrasts that test polynomial patterns in data with more than two means (e.g., linear, quadratic, cubic, quartic, etc.).^[9]
Orthonormal contrasts are orthogonal contrasts which satisfy the additional condition that, for each contrast, the sum squares of the coefficients add up to one.^[7]

Background

A contrast is defined as the sum of each group mean multiplied by a coefficient for each group (i.e., a signed number, c_j).^[10] In equation form, $L=c_{1}{\bar {X}}_{1}+c_{2}{\bar {X}}_{2}+\cdots +c_{k}{\bar {X}}_{k}=\sum c_{j}{\bar {X}}_{j}$ , where L is the weighted sum of group means, the c_j coefficients represent the assigned weights of the means (these must sum to 0 for orthogonal contrasts), and ${\bar {X}}$ _j represents the group means.^[8] Coefficients can be positive or negative, and fractions or whole numbers, depending on the comparison of interest. Linear contrasts are very useful and can be used to test complex hypotheses when used in conjunction with ANOVA or multiple regression. In essence, each contrast defines and tests for a particular pattern of differences among the means.^[10]

Contrasts should be constructed "to answer specific research questions", and do not necessarily have to be orthogonal.^[11]

A simple (non-orthogonal) contrast is the difference between two means. A more complex contrast can test the difference between several means (i.e., if you have four means, assign coefficients of –3, –1, +1, and +3), or test the difference between a single mean and the combined mean of several groups (i.e., if you have four means assign coefficients of –3, +1, +1, and +1) or test the difference between the combined mean of several groups and the combined mean of several other groups (i.e., if you have four means assign coefficients of –1, –1, +1, and +1).^[8] The coefficients for the means to be combined (or averaged) must be the same in magnitude and direction, in other words, they are weighted equally. When means are assigned different coefficients (either in magnitude or direction, or both), the contrast is testing for a difference between those means. A contrast may be any of: the set of coefficients used to specify a comparison; the specific value of the linear combination obtained for a given study or experiment; the random quantity defined by applying the linear combination to treatment effects when these are themselves considered as random variables. In the last context here, the term contrast variable is sometimes used.

Contrasts are sometimes used to compare mixed effects. A common example can be the difference between two test scores — one at the beginning of the semester and one at its end. Note that we are not interested in one of these scores by itself, but only in the contrast (in this case — the difference). Since this is a linear combination of independent variables, its variance will match accordingly, as the weighted sum of the variances; in this case both weights are one. This "blending" of two variables into one might be useful in many cases such as ANOVA, regression, or even as descriptive statistics in its own right.

An example of a complex contrast would be comparing 5 standard treatments to a new treatment, hence giving each old treatment mean a weight of 1/5, and the new sixth treatment mean a weight of −1 (using the equation above). If this new linear combination has a mean zero, this will mean that the old treatments are not different from the new treatment on average. If the sum of the new linear combination is positive, this will mean that the combined mean of the 5 standard treatments is higher than the new treatment mean. If the sum of the new linear combination is negative, this will mean the combined mean of the 5 standard treatments is lower than the new treatment mean.^[10] However, the sum of the linear combination is not a significance test, see testing significance (below) to learn how to determine if your contrast is significant.

The usual results for linear combinations of independent random variables mean that the variance of a contrast is equal to the weighted sum of the variances.^[12] If two contrasts are orthogonal, estimates created by using such contrasts will be uncorrelated. This helps to minimize the Type I Error Rate, the rate of falsely rejecting a true null hypothesis. Because orthogonal contrasts test different aspects of the data, they are independent, the results of one contrast has no effect on the results of the other contrasts. When contrasts are not orthogonal, they are not testing completing different aspects of the data, the results of one contrast can then influence the results of other contrasts. This can increase the chance of falsely rejecting a true null hypothesis.^[8]

If orthogonal contrasts are available, it is possible to summarize the results of a statistical analysis in the form of a simple analysis of variance table, in such a way that it contains the results for different test statistics relating to different contrasts, each of which are statistically independent. Linear contrasts can be easily converted into sums of squares. SS_contrast = ${\tfrac {n(\sum c_{j}{\bar {X}}_{j})^{2}}{\sum c_{j}^{2}}}$ , with 1 degree of freedom, where n represents the number of observations per group. If the contrasts are orthogonal, the sum of the SS_contrasts = SS_treatment. Testing the significance of a contrast requires the computation of SS_contrast.^[8] A recent development in statistical analysis is the standardized mean of a contrast variable. This makes a comparison between the size of the differences between groups, as measured by a contrast and the accuracy with which that contrast can be measured by a given study or experiment.^[13]

Testing significance

SS_contrast also happens to be a mean square because all contrasts have 1 degree of freedom. Dividing MS_contrast by MS_error produces an F-statistic with one and df_error degrees of freedom, the statistical significance of F_contrast can be determined by comparing the obtained F statistic with a critical value of F with the same degrees of freedom.^[8]

References

Casella, George; Berger, Roger L (2001). Statistical inference. Cengage Learning. ISBN 9780534243128.
George Casella (2008). Statistical design. Springer. ISBN 978-0-387-75965-4.
Everitt, B S; Skrondal, A (2010). Cambridge dictionary of statistics (4th ed.). Cambridge University Press. ISBN 9780521766999.
Dean, Angela M; Voss, Daniel (1999). Design and analysis of experiments. Springer. ISBN 9780387985619.

External links

Notes

↑ Casella, George; Berger, Roger L (2001). Statistical inference. Cengage Learning. ISBN 9780534243128.
↑ George Casella (2008). Statistical design. Springer. ISBN 978-0-387-75965-4.
↑ Casella a Berger 2001, p. 526.
1 2 Casella 2008, p. 11.
↑ Casella 2008, p. 12.
↑ Casella 2008, p. 13.
1 2 Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-X (entry for "Orthogonal contrasts")
1 2 3 4 5 6 Howell, David C. (2010). Statistical methods for psychology (7th ed.). Belmont, CA: Thomson Wadsworth. ISBN 978-0-495-59784-1.
↑ Kim, Jong Sung. "Orthogonal Polynomial Contrasts" (PDF). Retrieved 27 April 2012.
1 2 3 Clark, James M. (2007). Intermediate Data Analysis: Multiple Regression and Analysis of Variance. University of Winnipeg.
↑ Kuehl, Robert O. (2000). Design of experiments: statistical principles of research design and analysis (2nd ed.). Pacific Grove, CA: Duxbury/Thomson Learning. ISBN 0534368344.
↑ NIST/SEMATECH e-Handbook of Statistical Methods
↑ Zhang XHD (2011). Optimal High-Throughput Screening: Practical Experimental Design and Data Analysis for Genome-scale RNAi Research. Cambridge University Press. ISBN 978-0-521-73444-8.

This article is issued from Wikipedia - version of the 6/2/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.