Freedman–Diaconis rule

In statistics, the Freedman–Diaconis rule, named after David A. Freedman and Persi Diaconis, can be used to select the size of the bins to be used in a histogram.^[1] For a set empirical measurements sampled from some probability distribution, the Freedman-Diaconis rule is designed to minimize the difference between the area under the empirical probability distribution and the area under the theoretical probability distribution.

The general equation for the rule is:

{\text{Bin size}}=2\,{{\text{IQR}}(x) \over {\sqrt[{3}]{n}}}\;

where $\scriptstyle\operatorname{IQR}(x) \;$ is the interquartile range of the data and $\scriptstyle n \;$ is the number of observations in the sample $\scriptstyle x. \;$

Other approaches

Another approach is to use Sturges' rule: use a bin so large that there are about $\scriptstyle 1+\log_2n$ non-empty bins (Scott, 2009).^[2] This works well for n under 200, but was found to be inaccurate for large n.^[3] For a discussion and an alternative approach, see Birgé and Rozenholc.^[4]

References

↑ Freedman, David; Diaconis, Persi (December 1981). "On the histogram as a density estimator: L₂ theory" (PDF). Probability Theory and Related Fields. Heidelberg: Springer Berlin. 57 (4): 453–476. ISSN 0178-8051. Retrieved 2009-01-06.
↑ Scott, D.W. (2009). "Sturges' rule". WIREs Computational Statistics. 1: 303–306. doi:10.1002/wics.35.
↑ Hyndman, R.J. (1995). "The problem with Sturges' rule for constructing histograms" (PDF).
↑ Birgé, L.; Rozenholc, Y. (2006). "How many bins should be put in a regular histogram". ESAIM: Probability and Statistics. 10: 24–45. doi:10.1051/ps:2006001.

This article is issued from Wikipedia - version of the 9/27/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.