Talk:Quartile
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Concern
[edit]The article says:
- Example 2:
- Ordered Data Set: 7, 15, 36, 39, 40, 41
- Q1 = (36+15)/2 = 25.5
- Q2 = (39+36)/2 = 37.5
- Q3 = (40+39)/2 = 39.5
Now unless I've misunderstood, shouldn't Q1 be 15 and Q3 be 40? The median cuts the 6 member data set into two 3 member data sets. The median of a 3 member data set is the item in the middle. 80.177.129.251 11:33, 11 September 2006 (UTC)
Example 1
[edit]This is flat out wrong. Q1=15, the median is 40, and Q3=43. How could they see the correct median (Q2), but screw up the other quartiles? The number of variable is either odd or even: if it's an odd number, there is no need to do any averaging to find the quartiles. If it's even, you need the averages of three sets of two numbers. I could see this error happening if we were dealing with say 117 variables but 11? Sheesh.
Examples, discrete case
[edit]As the article stands ggk , the above posts have been taken into account. However, the two examples in the article - and also the 2nd post above - seem to indicate there are two cases, n odd and n even. I believe there are four cases (depending on n modulus 4):
- 2, 6, 10, 14, ... observations:
- Example with 10 observations: 11,13,16,17,19 ; 22,23,27,28,30: Q1=16, M=(19+22)/2, Q3=27 (mean involved in median only)
- 3, 7, 11, 15, ... observations:
- Example with 7 observations: 11,13,16,17,19,22,23: Q1=13, M=17, Q3=22 (no means involved)
- 4, 8, 12, 16, ... observations:
- Example with 8 observations: 11,13 ; 16,17 ; 19,22 ; 23,27: Q1=(13+16)/2, M=(17+19)/2, Q3=(22+23)/2 (all means!)
- 5, 9, 13, 17, ... observations:
- Example with 9 observations: 11,13 ; 16,17,19,22,23 ; 27,28: Q1=(13+16)/2, M=19, Q3=(23+27)/2 (means involved in quartiles only)
In other words:
- With an odd number of observations, the median is the middle observation, and the quartiles are the medians of the lower resp. upper half of the observations, omitting the middle one.
- With an even number of observations, the median is the mean of the two middle observations, and the quartiles are the medians of the lower resp. upper half of the observations.
Finding the median of half the observations, one may again have to consider either an odd or an even number of observations; hence the four cases above.
Grouped data
[edit]Honestly I do not understand everything in the article; the answer to the following question may be hidden in there. But how do you find quartiles for grouped data? (I know the answer, I think, involving a cumulative frequency graph, but I have discovered that some advocate strange variants of "my" methods, and I donøt understand why.)--Niels Ø 22:44, 27 November 2006 (UTC)
Erroneous statement
[edit]I have removed an erroneous statement saying that in applied work the quartiles are the intervals between the quartile points. This is a widespread error and I'd like to keep it out out Wikipedia.
Explicit rule
[edit]I added an example of an explicit rule for computing quartile values (there is no uniform agreement on this). I this way the reader can work out the examples for himself. I also added an example where the quartile values are not data points. 84.196.107.235 07:13, 10 March 2007 (UTC)
Invalid Example That doesn't follow the rule
[edit]I think the title is enough to explain everything, not to mention all the discussions above. The first example is clearly inconsistent with what is mentioned to be the rule of finding quartiles (lower and upper). Please resolve this. --Freiddie 12:56, 1 April 2007 (UTC)
Article should cover all methods
[edit]The article should cover all the competing definitions of the sample quartile, and compare and contrast them. The article notes that there is no universal agreement on how to choose the values, but then goes on to give a formula for how to choose them, without reconciling the contradiction. From quantile, there are nine distinct ways of calculating sample quantiles. Unless some of them are redundant for quartiles, the article should explain all of them.--Srleffler (talk) 04:53, 20 March 2010 (UTC)
Quarter
[edit]Does the link for Quarter belong? --Zzo38 (talk) 06:20, 19 February 2011 (UTC)
- No, it's a disambiguation page. +mt 21:13, 19 February 2011 (UTC)
- Which is what I thought. I noticed that, too. But I want to know what might be the purpose that whoever put that there, I don't know that, please. --Zzo38 (talk) 00:16, 7 March 2011 (UTC)
- Ah, I see what your after now. I removed the link since I don't see how it is helpful. Thanks for pointing that out. +mt 01:35, 7 March 2011 (UTC)
- Which is what I thought. I noticed that, too. But I want to know what might be the purpose that whoever put that there, I don't know that, please. --Zzo38 (talk) 00:16, 7 March 2011 (UTC)
"In epidemiology, the four ranges defined by the three values are discussed here." - huh?
[edit]The sentence "In epidemiology, the four ranges defined by the three values are discussed here." has been in the article lead for a very long time, but it doesn't make sense. Quartiles in epidemiology are discussed neither in Quartile, nor in epidemiology.
-- Dandv(talk|contribs) 08:54, 8 September 2011 (UTC)
- That version had been incorrectly changed from a more correct earlier version. I have replaced it by a slightly more expanded version of what it was trying to say. Unfortunately the rest of the article doesn't cover both of the possible meanings of "quartile". Melcombe (talk) 09:13, 9 September 2011 (UTC)
Applying the percentile formula
[edit]When one applies the percentilie formula, the number obtained should be rounded UP: L=1.2 becomes 2 not 1 as the section says. The percentile formula should be enlisted as one of the methods not standing separately. All the three methods mentioned, give sometimes slightly different estimates for the quartiles, depending on N mod 4, as one user indicated before. — Preceding unsigned comment added by 205.178.108.254 (talk) 17:54, 28 January 2012 (UTC)
- There is a reference given for the formulation stated. Does the formulation stated agree with the reference? If you want to state a diffeent formulation, provide a reference for it. Melcombe (talk) 21:00, 28 January 2012 (UTC)
- The reference is the quantile article on wiki: http://en.wikipedia.org/wiki/Quantile .It's in the description and also in the table at the end describing all used estimates for quantiles. The method to round DOWN the L number corresponds to the first formula in the table with [h-1/2] all the other formulas use predominantly [h+1/2] which is rounding UP. Rounding L UP is much more common because it is more natural when the discrete distribution is modeled with a step-wise continuous one. — Preceding unsigned comment added by 205.178.104.178 (talk) 15:10, 30 January 2012 (UTC)
- Other Wikipedia articles are not considered reliable sources: see Wikipedia:Reliable sources. If that article does contain a reliable source it could be copied over. Otherwise find a reliable source that can be referenced. Melcombe (talk) 18:05, 30 January 2012 (UTC)
- The reliable sourse is right there in the quantile article, citation [1]
- I see that one is already cited in this article. In that case you'll have no trouble constructing an acceptable addition to the article. Melcombe (talk) 22:21, 1 February 2012 (UTC)
"There is no universal agreement on choosing the quartile values"
[edit]For continuous probability distributions there is only one definition. For distributions with gaps between the allowed values, like discrete distributions, there is no agreement because when a quartile falls in a gap, you can position it anywhere in that gap.
The article should point out that the lack of universal agreement on quartiles is for distributions with gaps, like discrete distributions. — Preceding unsigned comment added by 205.178.104.178 (talk) 15:34, 30 January 2012 (UTC)
Examples
[edit]As another commenter pointed out, the different definitions of quartiles for discrete distributions differ depending on N mod 4 (the remainder when the number of datums N is divided by 4). The primordial examples, without being trivial, are for:
- data = {1,2,3,4} (this is N mod 4 = 0)
- data = {1,2,3,4,5} (N mod 4 = 1}
- data = {1,2,3,4,5,6} (N mod 4 = 2}
- data = {1,2,3,4,5,6,7} {N mod 4 = 3}
Comparing the quartiles for these data lists will reveal all the possible scenarios in which the quartile definitions in the article differ.
The way the quartiles are calculated according to the chosen quartile definition, depends only on N. The actual values obtained depends on the actual data. Choosing numbers like 1,2,3 ... for the data emphasizes that point and hints directly to the reader how a quartile was calculated. For example a quartile of 1.5 would hint that this quartile is in the middle between the first and second number of the list. — Preceding unsigned comment added by 205.178.104.178 (talk) 15:48, 30 January 2012 (UTC)
Example 2 Method 2
[edit]Is it just me or is that messed up? Basically, if the median is the average of the two middle numbers, then Methods 1 and 2 should be exactly the same, should they not? That is what I see from my reading of the rules. And, that would mean that the answer for Example 2 Method 2 would be 15, 37.5, 40, just as with Method 1. NumberTheorist (talk) 17:27, 9 May 2012 (UTC)
1-pass calculation method
[edit]The calculation methods mentioned in the article are 2-pass methods (first sort all data, then calculate quartile). This method is not feasable for huge amounts of data, which will e.g. not fit into memory (e.g. median age of all human beings). Apparently there are as well 1-pass methods. [1] talks about a 1-pass algorithm based on the piecewise-parabolic (P2) algorithm developed by Jain and Chlamtac (1985). I'd like to see an explanation of such a 1-pass algorithm in the article. --Sebastian.Dietrich (talk) 17:08, 18 January 2013 (UTC)
Please write a version for Simple English Wikipedia
[edit]I would appreciate if someone competent (about both quartiles and writing) would compose a version of this article for Simple English Wikipedia so that I might grasp the quartile concept. Perhaps if I read this unclear existing version by sliding my finger slowly along the words and moving my lips, I might eventually come to understand it, but life is short, so – fuck it. — O'Dea (talk) 13:53, 27 May 2013 (UTC)
Outliers is a separate topic discussed elsewhere
[edit]There is no good reason for discussing outliers on this page. In particular,
- the page makes recommendation about action on finding outliers; this is broadly inappropriate in a discussion about quartiles as well as being inconsistent with Wikipedia policy.
- There are few inline citations in teh outliers section;
- Outliers are discussed in far more detail on the 'outliers' page;
- The outlier criteria given here are commonly used in box plots and need not be repeated here.
I suggest reducing the Outliers section to a comment to the effect that quartiles are used in some outlier screening checks with a reference to those sections. SLR Ellison (talk) 11:59, 1 June 2018 (UTC)
Method 3 description
[edit]On method 3: "This always gives the arithmetic mean of Methods 1 and 2"
Method 3 only gives the arithmetic mean of method 1 and method 2 in the 4n+3 case, not the 4n+1 case.
Why isn't there a description of quartile excel functions like quartile.inc function (lowest from 0 and highest from 4) and quartile.exc function (return of a sp. value excluding quartiles 0 and 4). — Preceding unsigned comment added by 174.4.26.61 (talk) 19:21, 7 November 2018 (UTC)
unclear terminology
[edit]As of this writing, the article intro says "The first quartile (Q1) is defined as the middle number between the smallest number (minimum) and the median of the data set."
I think both "middle number" is too-ambiguous terminology here, for uninitiated readers. I'm sure that there are students reading this as meaning "the middle of the subrange" instead of "the median of the subrange".
I'm not sure the best terminology here, but I propose replacing "middle number" with something else.
25th (empirical) quartile? Do you mean 1st quartile or 25th percentile? — Preceding unsigned comment added by Haruhiko Okumura (talk • contribs) 02:30, 1 January 2022 (UTC)
Method 3 makes no sense
[edit]The method 3 point #2 and #3 say, "If there are (4n+1) data points", and "If there are (4n+3) data points".
However, there are always exactly "n" data points per definition, so there is no way either apply.
Merudo77 (talk) 18:32, 30 January 2022 (UTC)
Does excel/google really use Method 3 and 4?
[edit]When I calculate it by hand, it looks like these tools use Method 1 and 2. 45.22.247.39 (talk) 16:56, 19 March 2024 (UTC)