Jump to content

Newman–Keuls method: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m History: clean up using AWB
m change U+00B5 to U+03BC (μ) per Unicode standard and MOS:NUM#Specific units - see Unicode compatibility characters (via WP:JWB)
 
(40 intermediate revisions by 22 users not shown)
Line 1: Line 1:
The '''Newman–Keuls''' or '''Student–Newman–Keuls (SNK)''' method is a stepwise [[multiple comparisons]] procedure used to identify [[Sample (statistics)|sample]] [[Arithmetic mean|means]] that are [[Statistical significance|significantly]] different from each other.<ref name=Muth>{{cite book |last1 = Muth |first1 = James E. De |title = Basic Statistics and Pharmaceutical Statistical Applications |edition=2nd |publisher = Chapman and Hall/CRC |location = Boca Raton, FL |year = 2006 |isbn = 0-849-33799-2 |pages=229–259}}</ref> It was named after [[William Sealy Gosset|Student]] (1927),<ref name=Student>{{cite journal|author=Student|title=Errors of routine analysis|journal=Biometrika|volume=19|issue=1/2|year=1927|pages=151–164|doi=10.2307/2332181|url=http://biomet.oxfordjournals.org/content/19/1-2/151.full.pdf+html}}</ref> D. Newman,<ref name=Newman>{{cite journal|author=Newman D|title=The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation|journal=Biometrika|volume=31|issue=1|year=1939|pages=20–30|doi=10.1093/biomet/31.1-2.20|url=http://biomet.oxfordjournals.org/content/31/1-2/20.full.pdf+html}}</ref> and M. Keuls.<ref name=Keuls>{{cite journal|author=Keuls M|title=The use of the "studentized range" in connection with an analysis of variance|journal=Euphytica|volume=1|year=1952|pages=112–122|doi=10.1007/bf01908269|url=http://www.wias-berlin.de/people/dickhaus/downloads/MultipleTests-SoSe-2010/keuls1952.pdf}}</ref> This procedure is often used as a [[post-hoc analysis|post-hoc test]] whenever a significant difference between three or more sample means has been revealed by an [[analysis of variance|analysis of variance (ANOVA)]].<ref name=Muth /> The Newman–Keuls method is similar to [[Tukey's range test]] as both procedures use [[Studentized range|Studentized range statistics]].<ref name=Broota>{{cite book |last1 = Broota |first1 = K.D. |title = Experimental Design in Behavioural Research |edition=1st |publisher = New Age International (P) Ltd. |location = New Delhi, India |year = 1989 |isbn = 8-122-40215-1 |pages=81–96}}</ref><ref name=Sheskin>{{cite book |last1 = Sheskin |first1 = David J. |title = Handbook of Parametric and Nonparametric Statistical Procedures |edition=3rd |publisher = CRC Press |location = Boca Raton, FL |year = 1989 |isbn = 1-584-88440-1 |pages=665–756}}</ref> Unlike Tukey's range test, the Newman–Keuls method uses different [[Critical values#Statistics|critical values]] for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit [[type I errors]] by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more [[Statistical power|powerful]] but less conservative than Tukey's range test.<ref name="Sheskin"/><ref name="Roberts and Russo">{{cite book |last1 = Roberts |first1 = Maxwell | last2 = Russo | first2 = Riccardo | chapter = Following up a one-factor between-subjects ANOVA | title = A Student's Guide to Analysis of Variance | year = 1999 | edition= |publisher = J&L Composition Ltd. |location = Filey, United Kingdom | isbn = 0-415-16564-4 |pages=82–109}}</ref>
The '''Newman–Keuls''' or '''Student–Newman–Keuls (SNK)''' '''method''' is a stepwise [[multiple comparisons]] procedure used to identify [[Sample (statistics)|sample]] [[Arithmetic mean|means]] that are [[Statistical significance|significantly]] different from each other.<ref name=Muth>{{cite book |last1 = De Muth |first1 = James E. |title = Basic Statistics and Pharmaceutical Statistical Applications |edition=2nd |publisher = Chapman and Hall/CRC |location = Boca Raton, FL |year = 2006 |isbn = 978-0-8493-3799-4 |pages=229–259}}</ref> It was named after [[William Sealy Gosset|Student]] (1927),<ref name=Student>{{cite journal|author=Student|title=Errors of routine analysis|journal=Biometrika|volume=19|issue=1/2|year=1927|pages=151–164|doi=10.2307/2332181|jstor=2332181}}</ref> D. Newman,<ref name=Newman>{{cite journal |last=Newman |first=D. |title=The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation|journal=Biometrika|volume=31|issue=1|year=1939|pages=20–30|doi=10.1093/biomet/31.1-2.20}}</ref> and M. Keuls.<ref name=Keuls>{{cite journal|last=Keuls |first=M. |title=The use of the "studentized range" in connection with an analysis of variance|journal=Euphytica|volume=1|issue=2 |year=1952|pages=112–122|doi=10.1007/bf01908269|s2cid=19365087 |url=http://www.wias-berlin.de/people/dickhaus/downloads/MultipleTests-SoSe-2010/keuls1952.pdf|url-status=dead|archive-url=https://web.archive.org/web/20141104043438/http://www.wias-berlin.de/people/dickhaus/downloads/MultipleTests-SoSe-2010/keuls1952.pdf|archive-date=2014-11-04}}</ref> This procedure is often used as a [[post-hoc analysis|post-hoc test]] whenever a significant difference between three or more sample means has been revealed by an [[analysis of variance|analysis of variance (ANOVA)]].<ref name=Muth /> The Newman–Keuls method is similar to [[Tukey's range test]] as both procedures use [[studentized range|studentized range statistics]].<ref name=Broota>{{cite book |last1 = Broota |first1 = K. D. |title = Experimental Design in Behavioural Research |edition=1st |publisher = New Age International (P) Ltd. |location = New Delhi, India |year = 1989 |isbn = 978-81-224-0215-5 |pages=81–96}}</ref><ref name=Sheskin>{{cite book |last1 = Sheskin |first1 = David J. |title = Handbook of Parametric and Nonparametric Statistical Procedures |edition=3rd |publisher = CRC Press |location = Boca Raton, FL |year = 1989 |isbn = 978-1-58488-440-8 |pages=665–756}}</ref> Unlike Tukey's range test, the Newman–Keuls method uses different [[Critical value (statistics)|critical value]]s for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit [[type I errors]] by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more [[Statistical power|powerful]] but less conservative than Tukey's range test.<ref name="Sheskin"/><ref name="Roberts and Russo">{{cite book |last1 = Roberts |first1 = Maxwell | last2 = Russo | first2 = Riccardo | chapter = Following up a one-factor between-subjects ANOVA | title = A Student's Guide to Analysis of Variance | year = 1999 |publisher = J&L Composition Ltd. |location = Filey, United Kingdom | isbn = 978-0-415-16564-8 |pages=82–109}}</ref>


==History==
==History and type I error rate control==


The Newman–Keuls method was introduced by Newman in 1939 and developed further by Keuls in 1952. This before [[John Tukey|Tukey]] presented the concept of different types of multiple error rates (1952a,<ref name=1952a>{{cite journal|author=Tukey, J.W |title=Reminder sheets for Allowances for various types of error rates. Unpublished manuscript |journal=Brown, 1984|year=1952a}}</ref> 1952b,<ref name=1952b>{{cite journal|author=Tukey, J.W |title=Reminder sheets for Multiple comparisons. Unpublished manuscript |journal=Brown, 1984|year=1952b}}</ref> 1953<ref name=1953a>{{cite journal|author=Tukey, J.W |title=The problem of multiple comparisons. Unpublished manuscript |journal=Brown, 1984|year=1953}}</ref>).
The Newman–Keuls method was introduced by Newman in 1939 and developed further by Keuls in 1952. This was before [[John Tukey|Tukey]] presented various definitions of error rates (1952a,<ref name=1952a>{{cite journal|author=Tukey, J. W. |title=Reminder sheets for Allowances for various types of error rates. Unpublished manuscript |journal=Brown, 1984|year=1952a}}</ref> 1952b,<ref name=1952b>{{cite journal|author=Tukey, J. W. |title=Reminder sheets for Multiple comparisons. Unpublished manuscript |journal=Brown, 1984|year=1952b}}</ref> 1953<ref name=1953a>{{cite journal|author=Tukey, J. W. |title=The problem of multiple comparisons. Unpublished manuscript |journal=Brown, 1984|year=1953}}</ref>).
The Newman–Keuls method controls the [[Family-wise error rate|Family-Wise Error Rate]] (FWER) in the weak sense but not the strong sense:<ref name=":0">{{Cite journal |last1=Proschan |first1=Michael A. |last2=Brittain |first2=Erica H.|author2-link=Erica Brittain |date=2020-04-30 |title=A primer on strong vs weak control of familywise error rate |url=http://dx.doi.org/10.1002/sim.8463 |journal=Statistics in Medicine |volume=39 |issue=9 |pages=1407–1413 |doi=10.1002/sim.8463 |pmid=32106332 |s2cid=211556180 |issn=0277-6715}}</ref><ref name=":1">{{Cite journal |last1=Keselman |first1=H. J. |last2=Keselman |first2=Joanne C. |last3=Games |first3=Paul A. |date=1991 |title=Maximum familywise Type I error rate: The least significant difference, Newman-Keuls, and other multiple comparison procedures. |url=http://doi.apa.org/getdoi.cfm?doi=10.1037/0033-2909.110.1.155 |journal=Psychological Bulletin |language=en |volume=110 |issue=1 |pages=155–161 |doi=10.1037/0033-2909.110.1.155 |issn=0033-2909}}</ref> the Newman–Keuls procedure controls the risk of rejecting the null hypothesis if all means are equal (global null hypothesis) but does not control the risk of rejecting partial null hypotheses. For instance, when four means are compared, under the partial null hypothesis that μ1=μ2 and μ3=μ4=μ+delta with a non-zero delta, the Newman–Keuls procedure has a probability greater than alpha of rejecting μ1=μ2 or μ3=μ4 or both. In that example, if delta is very large, the Newman–Keuls procedure is almost equivalent to two Student t tests testing μ1=μ2 and μ3=μ4 at nominal type I error rate alpha, without multiple testing procedure; therefore the FWER is almost doubled.<ref name=":0" /> In the worst case, the FWER of Newman–Keuls procedure is 1-(1-alpha)^int(J/2) where int(J/2) represents the [[integer part]] of the total number of groups divided by 2.<ref name=":1" /> Therefore, with two or three groups, the Newman–Keuls procedure has strong control over the FWER but not for four groups or more.
The Newman–Keuls method was popular during 1950s and 1960s{{citation needed|date=October 2014}}. But when the control of [[familywise error rate]] (FWER) became an accepted criterion in multiple comparison testing, the procedure became less popular{{citation needed|date=October 2014}} as it does not control FWER (except for the special case of exactly three groups<ref name=groups>{{cite journal|author1=MA Seaman, JR Levin |author2=RC Serlin M |lastauthoramp=yes |title=New Developments in pairwise multiple comparisons: Some powerful and practicable procedures |journal=Psychological Bulletin |year=1991|pages=577–586|url=http://psycnet.apa.org/journals/bul/110/3/577.pdf|doi=10.1037/0033-2909.110.3.577 }}</ref>).
In 1995 Benjamini and Hochberg presented a new, more liberal and more powerful criterion for those types of problems: [[False discovery rate]] (FDR) control.<ref name=control>{{cite journal|author=Benjamini, Y., Hochberg, Y|title=Controlling the false discovery rate: a new and powerful approach to multiple testing |url=http://engr.case.edu/ray_soumya/mlrg/controlling_fdr_benjamini95.pdf|journal=JRSS, series B,methodological 57 |year=1995|pages=289–300}}</ref> In 2006, Shaffer showed (by extensive simulation) that the Newman–Keuls method controls the FDR with some constrains.<ref name=constrains>{{cite journal|author=Shaffer, Juliet P|title=Controlling the false discovery rate with constraints: The Newman–Keuls test revisited |journal=Biometrical Journal |volume=47|year=2007|pages=136–143|pmid=17342955}}</ref>
In 1995 Benjamini and Hochberg presented a new, more liberal, and more powerful criterion for those types of problems: [[False discovery rate]] (FDR) control.<ref name=control>{{cite journal|author=Benjamini, Y., Hochberg, Y. |title=Controlling the false discovery rate: a new and powerful approach to multiple testing |url=http://engr.case.edu/ray_soumya/mlrg/controlling_fdr_benjamini95.pdf|journal= Journal of the Royal Statistical Society. Series B (Methodological)|volume=57 |issue=1 |year=1995|pages=289–300 |jstor=2346101|doi=10.1111/j.2517-6161.1995.tb02031.x }}</ref> In 2006, Shaffer showed (by extensive simulation) that the Newman–Keuls method controls the FDR with some constraints.<ref name=constrains>{{cite journal|author=Shaffer, Juliet P|title=Controlling the false discovery rate with constraints: The Newman–Keuls test revisited |journal=Biometrical Journal |volume=49|issue=1 |year=2007|pages=136–143|pmid=17342955|doi=10.1002/bimj.200610297|s2cid=32625652 }}</ref>


==Required assumptions==
==Required assumptions==


The assumptions of the Newman–Keuls test are essentially the same as for an independent groups [[t-test]]: normality, homogeneity of variance, and independent observations. The test is quite robust to violations of normality. Violating homogeneity of variance can be more problematic than in the two-sample case since the MSE is based on data from all groups. The assumption of independence of observations is important and should not be violated.
The assumptions of the Newman–Keuls test are essentially the same as for an independent groups [[t-test]]: [[Normal distribution|normality]], [[Homoscedasticity|homogeneity of variance]], and [[Independent and identically distributed random variables|independent observations]]. The test is quite robust to violations of normality. Violating homogeneity of variance can be more problematic than in the two-sample case since the MSE is based on data from all groups. The assumption of independence of observations is important and should not be violated.


==Procedures==
==Procedures==


The Newman–Keuls method employs a stepwise approach when comparing sample means.<ref name=Toothaker>{{cite book |last1 = Toothaker |first1 = Larry E. |title = Multiple Comparison Procedures (Quantitative Applications in the Social Sciences) |edition=2nd |publisher = Chapman and Hall/CRC |location = Newburry Park, CA |year = 1993 |isbn = 0-803-94177-3 |pages=27–45}}</ref> Prior to any mean comparison, all sample means are rank-ordered in ascending or descending order, thereby producing an ordered range (''p'') of sample means.<ref name=Muth /><ref name=Toothaker /> A comparison is then made between the largest and smallest sample means within the largest range.<ref name=Toothaker /> Assuming that the largest range is four means (or ''p'' = 4), a significant difference between the largest and smallest means as revealed by the Newman–Keuls method would result in a rejection of the [[null hypothesis]] for that specific range of means. The next largest comparison of two sample means would then be made within a smaller range of three means (or ''p'' = 3). Unless there is no significant differences between two sample means within any given range, this stepwise comparison of sample means will continue until a final comparison is made with the smallest range of just two means. If there is no significant difference between the two sample means, then all the null hypotheses within that range would be retained and no further comparisons within smaller ranges are necessary.
The Newman–Keuls method employs a stepwise approach when comparing sample means.<ref name=Toothaker>{{cite book |last1 = Toothaker |first1 = Larry E. |title = Multiple Comparison Procedures (Quantitative Applications in the Social Sciences) |edition=2nd |publisher = Chapman and Hall/CRC |location = Newburry Park, CA |year = 1993 |isbn = 978-0-8039-4177-9 |pages=27–45}}</ref> Prior to any mean comparison, all sample means are rank-ordered in ascending or descending order, thereby producing an ordered range (''p'') of sample means.<ref name=Muth /><ref name=Toothaker /> A comparison is then made between the largest and smallest sample means within the largest range.<ref name=Toothaker /> Assuming that the largest range is four means (or ''p'' = 4), a significant difference between the largest and smallest means as revealed by the Newman–Keuls method would result in a rejection of the [[null hypothesis]] for that specific range of means. The next largest comparison of two sample means would then be made within a smaller range of three means (or ''p'' = 3). Unless there is no significant differences between two sample means within any given range, this stepwise comparison of sample means will continue until a final comparison is made with the smallest range of just two means. If there is no significant difference between the two sample means, then all the null hypotheses within that range would be retained and no further comparisons within smaller ranges are necessary.


{| class="wikitable"
{| class="wikitable"
Line 24: Line 24:
! <math>\bar{X}_4</math>
! <math>\bar{X}_4</math>
|-
|-
| Mean values || 2 || 4 || 6|| 8
! Mean values || 2 || 4 || 6 || 8
|-
|-
!<math>\bar{X}_1 =</math> 2
| 2
|
|
| 2
| 2
Line 32: Line 32:
| 6
| 6
|-
|-
!<math>\bar{X}_2 =</math> 4
| 4
|
|
|
|
Line 38: Line 38:
| 4
| 4
|-
|-
!<math>\bar{X}_3 =</math> 6
| 6
|
|
|
|
Line 49: Line 49:
:<math> q = \frac{\bar{X}_A - \bar{X}_B}\sqrt{\frac{MSE}{n}}, </math>
:<math> q = \frac{\bar{X}_A - \bar{X}_B}\sqrt{\frac{MSE}{n}}, </math>


where <math>q</math> represents the [[Studentized range]] value, <math>\bar{X}_A</math> and <math>\bar{X}_B</math> are the largest and smallest sample means within a range, <math>MSE</math> is the error variance taken from the ANOVA table, and <math>n</math> is the sample size (number of observations within a sample). If comparisons are made with means of unequal sample sizes (<math>{n_A}\neq{n_B}</math>), then the Newman–Keuls formula would be adjusted as follows:
where <math>q</math> represents the [[studentized range]] value, <math>\bar{X}_A</math> and <math>\bar{X}_B</math> are the largest and smallest sample means within a range, <math>MSE</math> is the error variance taken from the ANOVA table, and <math>n</math> is the sample size (number of observations within a sample). If comparisons are made with means of unequal sample sizes (<math>{n_A}\neq{n_B}</math>), then the Newman–Keuls formula would be adjusted as follows:


:<math> q = \frac{\bar{X}_A - \bar{X}_B}\sqrt{\frac{MSE}{2}(\frac{1}{n_A} + \frac{1}{n_B})}, </math>
:<math> q = \frac{\bar{X}_A - \bar{X}_B}\sqrt{\frac{MSE}{2}(\frac{1}{n_A} + \frac{1}{n_B})}, </math>


where <math>n_A</math> and <math>n_B</math> represent the sample sizes of the two sample means. On both cases, [[Mean squared error|MSE]] (Mean squared error) is taken from the ANOVA conducted in the first stage of the analysis.
where <math>n_A</math> and <math>n_B</math> represent the sample sizes of the two sample means. On both cases, [[Mean squared error|MSE]] (mean squared error) is taken from the ANOVA conducted in the first stage of the analysis.


Once calculated, the computed ''q'' value can be compared to a ''q'' critical value (or <math>q_\alpha\,_\nu\,_p</math>), which can be found in a ''q'' distribution table based on the [[Statistical significance|significance level]] (<math>\alpha</math>), the error [[Degrees of freedom (statistics)|degrees of freedom]] (<math>\nu</math>) from the ANOVA table, and the range (<math>p</math>) of sample means to be tested.<ref name=Zar /> If the computed ''q'' value is equal to or greater than the ''q'' critical value, then the null hypothesis (''H''<sub>0</sub>: ''μ''<sub>A</sub> = ''μ''<sub>B</sub>) for that specific range of means can be rejected.<ref name=Zar>{{cite book |last1 = Zar |first1 = Jerrold H. |title = Biostatistical Analysis |edition=4th |publisher = Prentice Hall |location = Newburry Park, CA |year = 1999 |isbn = 0-130-81542-X |pages=208–230}}</ref> Because the number of means within a range changes with each successive pairwise comparison, the critical value of the ''q'' statistic also changes with each comparison, which makes the Neuman-Keuls method more lenient and hence more powerful than Tukey's range test. Thus, if a pairwise comparison was found to be significantly different using the Newman–Keuls method, it may not necessarily be significantly different when analyzed with Tukey's range test.<ref name="Roberts and Russo" /><ref name=Zar /> Conversely, if the pairwise comparison was found not to be significantly different using the Newman–Keuls method, it cannot in any way be significantly different when tested with Tukey's range test.<ref name="Roberts and Russo" />
Once calculated, the computed ''q'' value can be compared to a ''q'' critical value (or <math>q_\alpha\,_\nu\,_p</math>), which can be found in a ''q'' distribution table based on the [[Statistical significance|significance level]] (<math>\alpha</math>), the error [[Degrees of freedom (statistics)|degrees of freedom]] (<math>\nu</math>) from the ANOVA table, and the range (<math>p</math>) of sample means to be tested.<ref name=Zar /> If the computed ''q'' value is equal to or greater than the ''q'' critical value, then the null hypothesis (''H''<sub>0</sub>: ''μ''<sub>A</sub> = ''μ''<sub>B</sub>) for that specific range of means can be rejected.<ref name=Zar>{{cite book |last1 = Zar |first1 = Jerrold H. |title = Biostatistical Analysis |edition=4th |publisher = Prentice Hall |location = Newburry Park, CA |year = 1999 |isbn = 978-0-13-081542-2 |pages=208–230}}</ref> Because the number of means within a range changes with each successive pairwise comparison, the critical value of the ''q'' statistic also changes with each comparison, which makes the Neuman-Keuls method more lenient and hence more powerful than Tukey's range test. Thus, if a pairwise comparison was found to be significantly different using the Newman–Keuls method, it may not necessarily be significantly different when analyzed with Tukey's range test.<ref name="Roberts and Russo" /><ref name=Zar /> Conversely, if the pairwise comparison was found not to be significantly different using the Newman–Keuls method, it cannot be significantly different with Tukey's range test either.<ref name="Roberts and Russo" />


==Limitations==
==Limitations==


The Newman–Keuls procedure cannot produce an α% confidence interval for each mean difference, or for multiplicity adjusted exact p-values due to its sequential nature.{{citation needed|date=October 2014}} Results are somewhat difficult to interpret since it is difficult to articulate what are the null hypothesis that were tested.{{citation needed|date=October 2014}}
The Newman–Keuls procedure cannot produce a confidence interval for each mean difference, or for multiplicity adjusted exact p-values due to its sequential nature.{{citation needed|date=October 2014}} Results are somewhat difficult to interpret since it is difficult to articulate what are the null hypotheses that were tested.{{citation needed|date=October 2014}}


==See also==
==See also==
Line 71: Line 71:
{{DEFAULTSORT:Newman-Keuls method}}
{{DEFAULTSORT:Newman-Keuls method}}
[[Category:Multiple comparisons]]
[[Category:Multiple comparisons]]
[[Category:Analysis of variance]]

Latest revision as of 23:03, 16 May 2024

The Newman–Keuls or Student–Newman–Keuls (SNK) method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other.[1] It was named after Student (1927),[2] D. Newman,[3] and M. Keuls.[4] This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA).[1] The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics.[5][6] Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.[6][7]

History and type I error rate control

[edit]

The Newman–Keuls method was introduced by Newman in 1939 and developed further by Keuls in 1952. This was before Tukey presented various definitions of error rates (1952a,[8] 1952b,[9] 1953[10]). The Newman–Keuls method controls the Family-Wise Error Rate (FWER) in the weak sense but not the strong sense:[11][12] the Newman–Keuls procedure controls the risk of rejecting the null hypothesis if all means are equal (global null hypothesis) but does not control the risk of rejecting partial null hypotheses. For instance, when four means are compared, under the partial null hypothesis that μ1=μ2 and μ3=μ4=μ+delta with a non-zero delta, the Newman–Keuls procedure has a probability greater than alpha of rejecting μ1=μ2 or μ3=μ4 or both. In that example, if delta is very large, the Newman–Keuls procedure is almost equivalent to two Student t tests testing μ1=μ2 and μ3=μ4 at nominal type I error rate alpha, without multiple testing procedure; therefore the FWER is almost doubled.[11] In the worst case, the FWER of Newman–Keuls procedure is 1-(1-alpha)^int(J/2) where int(J/2) represents the integer part of the total number of groups divided by 2.[12] Therefore, with two or three groups, the Newman–Keuls procedure has strong control over the FWER but not for four groups or more. In 1995 Benjamini and Hochberg presented a new, more liberal, and more powerful criterion for those types of problems: False discovery rate (FDR) control.[13] In 2006, Shaffer showed (by extensive simulation) that the Newman–Keuls method controls the FDR with some constraints.[14]

Required assumptions

[edit]

The assumptions of the Newman–Keuls test are essentially the same as for an independent groups t-test: normality, homogeneity of variance, and independent observations. The test is quite robust to violations of normality. Violating homogeneity of variance can be more problematic than in the two-sample case since the MSE is based on data from all groups. The assumption of independence of observations is important and should not be violated.

Procedures

[edit]

The Newman–Keuls method employs a stepwise approach when comparing sample means.[15] Prior to any mean comparison, all sample means are rank-ordered in ascending or descending order, thereby producing an ordered range (p) of sample means.[1][15] A comparison is then made between the largest and smallest sample means within the largest range.[15] Assuming that the largest range is four means (or p = 4), a significant difference between the largest and smallest means as revealed by the Newman–Keuls method would result in a rejection of the null hypothesis for that specific range of means. The next largest comparison of two sample means would then be made within a smaller range of three means (or p = 3). Unless there is no significant differences between two sample means within any given range, this stepwise comparison of sample means will continue until a final comparison is made with the smallest range of just two means. If there is no significant difference between the two sample means, then all the null hypotheses within that range would be retained and no further comparisons within smaller ranges are necessary.

Range of sample means
Mean values 2 4 6 8
2 2 4 6
4 2 4
6 2

To determine if there is a significant difference between two means with equal sample sizes, the Newman–Keuls method uses a formula that is identical to the one used in Tukey's range test, which calculates the q value by taking the difference between two sample means and dividing it by the standard error:

where represents the studentized range value, and are the largest and smallest sample means within a range, is the error variance taken from the ANOVA table, and is the sample size (number of observations within a sample). If comparisons are made with means of unequal sample sizes (), then the Newman–Keuls formula would be adjusted as follows:

where and represent the sample sizes of the two sample means. On both cases, MSE (mean squared error) is taken from the ANOVA conducted in the first stage of the analysis.

Once calculated, the computed q value can be compared to a q critical value (or ), which can be found in a q distribution table based on the significance level (), the error degrees of freedom () from the ANOVA table, and the range () of sample means to be tested.[16] If the computed q value is equal to or greater than the q critical value, then the null hypothesis (H0: μA = μB) for that specific range of means can be rejected.[16] Because the number of means within a range changes with each successive pairwise comparison, the critical value of the q statistic also changes with each comparison, which makes the Neuman-Keuls method more lenient and hence more powerful than Tukey's range test. Thus, if a pairwise comparison was found to be significantly different using the Newman–Keuls method, it may not necessarily be significantly different when analyzed with Tukey's range test.[7][16] Conversely, if the pairwise comparison was found not to be significantly different using the Newman–Keuls method, it cannot be significantly different with Tukey's range test either.[7]

Limitations

[edit]

The Newman–Keuls procedure cannot produce a confidence interval for each mean difference, or for multiplicity adjusted exact p-values due to its sequential nature.[citation needed] Results are somewhat difficult to interpret since it is difficult to articulate what are the null hypotheses that were tested.[citation needed]

See also

[edit]

References

[edit]
  1. ^ a b c De Muth, James E. (2006). Basic Statistics and Pharmaceutical Statistical Applications (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC. pp. 229–259. ISBN 978-0-8493-3799-4.
  2. ^ Student (1927). "Errors of routine analysis". Biometrika. 19 (1/2): 151–164. doi:10.2307/2332181. JSTOR 2332181.
  3. ^ Newman, D. (1939). "The distribution of range in samples from a normal population, expressed in terms of an independent estimate of standard deviation". Biometrika. 31 (1): 20–30. doi:10.1093/biomet/31.1-2.20.
  4. ^ Keuls, M. (1952). "The use of the "studentized range" in connection with an analysis of variance" (PDF). Euphytica. 1 (2): 112–122. doi:10.1007/bf01908269. S2CID 19365087. Archived from the original (PDF) on 2014-11-04.
  5. ^ Broota, K. D. (1989). Experimental Design in Behavioural Research (1st ed.). New Delhi, India: New Age International (P) Ltd. pp. 81–96. ISBN 978-81-224-0215-5.
  6. ^ a b Sheskin, David J. (1989). Handbook of Parametric and Nonparametric Statistical Procedures (3rd ed.). Boca Raton, FL: CRC Press. pp. 665–756. ISBN 978-1-58488-440-8.
  7. ^ a b c Roberts, Maxwell; Russo, Riccardo (1999). "Following up a one-factor between-subjects ANOVA". A Student's Guide to Analysis of Variance. Filey, United Kingdom: J&L Composition Ltd. pp. 82–109. ISBN 978-0-415-16564-8.
  8. ^ Tukey, J. W. (1952a). "Reminder sheets for Allowances for various types of error rates. Unpublished manuscript". Brown, 1984.
  9. ^ Tukey, J. W. (1952b). "Reminder sheets for Multiple comparisons. Unpublished manuscript". Brown, 1984.
  10. ^ Tukey, J. W. (1953). "The problem of multiple comparisons. Unpublished manuscript". Brown, 1984.
  11. ^ a b Proschan, Michael A.; Brittain, Erica H. (2020-04-30). "A primer on strong vs weak control of familywise error rate". Statistics in Medicine. 39 (9): 1407–1413. doi:10.1002/sim.8463. ISSN 0277-6715. PMID 32106332. S2CID 211556180.
  12. ^ a b Keselman, H. J.; Keselman, Joanne C.; Games, Paul A. (1991). "Maximum familywise Type I error rate: The least significant difference, Newman-Keuls, and other multiple comparison procedures". Psychological Bulletin. 110 (1): 155–161. doi:10.1037/0033-2909.110.1.155. ISSN 0033-2909.
  13. ^ Benjamini, Y., Hochberg, Y. (1995). "Controlling the false discovery rate: a new and powerful approach to multiple testing" (PDF). Journal of the Royal Statistical Society. Series B (Methodological). 57 (1): 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x. JSTOR 2346101.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  14. ^ Shaffer, Juliet P (2007). "Controlling the false discovery rate with constraints: The Newman–Keuls test revisited". Biometrical Journal. 49 (1): 136–143. doi:10.1002/bimj.200610297. PMID 17342955. S2CID 32625652.
  15. ^ a b c Toothaker, Larry E. (1993). Multiple Comparison Procedures (Quantitative Applications in the Social Sciences) (2nd ed.). Newburry Park, CA: Chapman and Hall/CRC. pp. 27–45. ISBN 978-0-8039-4177-9.
  16. ^ a b c Zar, Jerrold H. (1999). Biostatistical Analysis (4th ed.). Newburry Park, CA: Prentice Hall. pp. 208–230. ISBN 978-0-13-081542-2.