2.6. Understanding Percentiles#
Percentiles are a statistical measure that helps us understand how a particular value compares to the rest of the data in a dataset. They are particularly useful when we want to know the relative standing of a value within a group.
Imagine a classroom full of students lined up from shortest to tallest.
Percentile tells us where a particular student falls in this line-up compared to others.
Let’s say there are 20 students.
The 50th percentile is the height of the student exactly in the middle. Half the class (10 students) will be shorter and the other half (10 students) will be taller.
Someone at the 90th percentile is very tall. Only 2 students out of 20 will be taller than this student.
The 10th percentile is on the shorter side. There will be 18 students taller than this student.
2.6.1. Key Points about Percentiles#
Dividing the Data: Percentiles split the data into 100 groups of equal size. Each group represents approximately 1% of the data points.
Not a Percentage Score: Being in the 90th percentile does not mean you scored 90 out of 100. It means you scored higher than 90% of the other values in the dataset.
Relative Standing: Percentiles tell us about the position of a value relative to others, not its absolute value.
2.6.2. Calculating Percentiles#
To find the value below which a certain percentage of observations fall, you can calculate the percentile using the following steps [Hyndman and Fan, 1996]:
Organize the Data:
Begin by sorting all the data points in your dataset in ascending order.
Find the Index:
The index determines where the percentile value lies within your sorted data. To find the index for the \(q\)-th percentile (where \( q \) is the desired percentile), use the formula:
(2.14)#\[I = (N - 1) \times \dfrac{q}{100} +1\]Here, \( N \) represents the total number of observations in your dataset.
Identify the Percentile Value:
If the index \( I \) is a whole number, the percentile value is the observation at that index in your sorted list.
If the index \( I \) is not a whole number, you’ll need to interpolate to find the percentile value. This means you’ll calculate the value that lies at the fractional part of the index between two observations. The formula for linear interpolation is:
(2.15)#\[ P = V_i + (V_{i+1} - V_i) \times (I - i), \quad i\geq 1\]In this formula, \( P \) is the percentile value you’re calculating, \( V_i \) is the value at the integer part of the index \( I \), and \( V_{i+1} \) is the next value in the sorted list. The term \( (I - i) \) represents the fractional part of the index.
2.6.3. Quartiles#
Quartiles divide a dataset into four equal-sized groups. Each group contains approximately 25% of the values from the data. These measures help us examine relative positions within the dataset. Here’s what you need to know:
Three Quartiles: Denoted as \(Q_1\), \(Q_2\), and \(Q_3\).
\(Q_1\): The value below which 25% of the data falls.
\(Q_2\) (Median): The value below which 50% of the data falls.
\(Q_3\): The value below which 75% of the data falls.
Note
When identifying percentiles such as \(P_{25}\), \(P_{50}\), or \(P_{75}\), we directly find the quartiles \(Q_1\), \(Q_2\), or \(Q_3\) instead.
You are given the following heights (in centimeters) of 10 individuals:
Find the 30th, 40th, 50th, 70th, and 65th percentiles.
Solution: We’ll use the formula for linear interpolation between data points to calculate the percentiles mathematically for the given data. The data points are already sorted, which is the first step. The data set is:
Index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
---|---|---|---|---|---|---|---|---|---|---|
Number |
150 |
160 |
165 |
170 |
175 |
180 |
185 |
190 |
195 |
200 |
There are \( N = 10 \) data points. Let’s calculate the percentiles one by one.
30th Percentile (\(P_{30}\)):
Calculate the index:
Since the index is not an integer, we interpolate between the 3rd and 4th values in the sorted list (165 and 170).
Linear interpolation:
40th Percentile (\(P_{40}\)):
Calculate the index:
Interpolate between the 4th and 5th values (170 and 175).
Linear interpolation:
50th Percentile (\(P_{50}\)) or Median:
Calculate the index:
Interpolate between the 5th and 6th values (175 and 180).
Linear interpolation:
70th Percentile (\(P_{70}\)):
Calculate the index:
Interpolate between the 7th and 8th values (185 and 190).
Linear interpolation:
65th Percentile (\(P_{65}\)):
Calculate the index:
Interpolate between the 6th and 7th values (180 and 185).
Linear interpolation:
Find the percentiles for the given ages of Academy Award-winning best actors:
a. 40th Percentile
b. 78th Percentile
Solution:
The ordered ages are:
Index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
---|---|---|---|---|---|---|---|---|---|---|
Number |
18 |
21 |
22 |
25 |
26 |
27 |
29 |
30 |
31 |
33 |
Index |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
Number |
36 |
37 |
41 |
42 |
47 |
52 |
55 |
57 |
58 |
62 |
Index |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
|
Number |
64 |
67 |
69 |
71 |
72 |
73 |
74 |
76 |
77 |
a. 40th Percentile:
Organize the Data:
The ordered ages are: 18, 21, 22, 25, 26, 27, 29, 30, 31, 33, 36, 37, 41, 42, 47, 52, 55, 57, 58, 62, 64, 67, 69, 71, 72, 73, 74, 76, 77.
Find the Index:
Using the formula:
\[\begin{equation*} I = \dfrac{(29 - 1) \times 40}{100} + 1 = 12.2 \end{equation*}\]This gives us the index.
Identify the Percentile Value:
Since the index is not a whole number, we interpolate:
\[\begin{equation*} P_{40} = 37 + 0.2 \times (41 - 37) = 37 + 0.2 \times 4 = 37.8 \end{equation*}\]
b. 78th Percentile:
Organize the Data:
The ordered ages are the same as before.
Find the Index:
Using the formula:
\[\begin{equation*} I = \dfrac{(29 - 1) \times 78}{100} + 1 = 22.84 \end{equation*}\]This gives us the index.
Identify the Percentile Value:
Since the index is not a whole number, we interpolate:
\[\begin{equation*} P_{78} = 67 + 0.84 \times (69 - 67) = 67 + 0.84 \times 2 = 68.68 \end{equation*}\]
Therefore, the 78th percentile is 68.68.
Suppose you have the following test scores (ordered from lowest to highest):
Find the 20th, 40th, 60th, 80th, and 90th percentiles.
Solution:
Index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number |
43 |
54 |
56 |
61 |
62 |
66 |
68 |
69 |
69 |
70 |
71 |
72 |
77 |
Index |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
|
Number |
78 |
79 |
85 |
87 |
88 |
89 |
93 |
95 |
96 |
98 |
99 |
99 |
For the 20th percentile (\(P_{20}\)):
Calculate the index:
Interpolate:
For the 40th percentile (\(P_{40}\)):
Calculate the index:
Interpolate:
For the 60th percentile (\(P_{60}\)):
Calculate the index:
\[\begin{equation*}I = \dfrac{(25 - 1) \times 60}{100} + 1 = 15.4\end{equation*}\]Interpolate:
For the 80th percentile (\(P_{80}\)):
Calculate the index:
Interpolate:
For the 90th percentile (\(P_{90}\)):
Calculate the index:
Interpolate:
Find the requested percentiles for the given dataset of ages:
a. \(P_{60}\) (60th Percentile)
b. \(P_{40}\) (40th Percentile)
c. \(P_{50}\) (50th Percentile, Median)
d. \(P_{25}\) (25th Percentile, \(Q_{1}\))
Solution:
Organize the Data:
Index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number |
36 |
37 |
40 |
40 |
41 |
43 |
43.5 |
46 |
46 |
47 |
48 |
48 |
49 |
50 |
Index |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
Number |
52 |
52.5 |
53 |
53 |
54 |
57.3 |
57.5 |
58 |
59 |
59 |
59 |
60 |
60.5 |
61 |
Index |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
Number |
61 |
61.5 |
62 |
63 |
63 |
63 |
63.5 |
64 |
64 |
64 |
65 |
65 |
66.5 |
67 |
Index |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
||
Number |
67.5 |
68.5 |
70 |
70.5 |
72 |
72 |
72 |
72 |
73 |
73.5 |
75 |
76.5 |
For \(P_{60}\):
Calculate the index:
Interpolate:
For \(P_{40}\):
Calculate the index:
Interpolate:
For \(P_{50}\) (Median):
Calculate the index:
Interpolate:
For \(P_{25}\) (Q1):
Calculate the index:
Interpolate:
Therefore, the calculated percentiles are:
2.6.4. Finding the Percentile of a Data Value#
Count the Values Less Than \(x\):
Determine how many values in your dataset are less than the given value \(x\). Let’s call this count \(n\).
Calculate the Percentile:
Divide \(n\) by the total number of values in the dataset. Let’s call the total number of values \(N\).
Multiply the result by 100.
The formula can be expressed as:
(2.17)#\[\begin{equation}\text{Percentile} = \dfrac{n}{N} \times 100\end{equation}\]Rounding:
Depending on the required precision, you may round the result to a certain number of decimal places or to the nearest whole number.
Remark
Some textbooks and statistical software may use different definitions for calculating percentiles, specifically:
“Less than or equal to” (≤): Counts values that are equal to the target value.
“Strictly less than” (<): Counts only those values that are less than the target value.
Impact on Results: These variations can lead to different percentile values, particularly in the following scenarios:
Small Datasets: The choice of method can significantly affect the results.
Repeated Values: The handling of duplicate values can also influence the calculated percentiles.
Importance of Context: Always check the specific definition used in your context, as it is essential for accurate interpretation and comparison of percentile results across different sources or statistical tools.
Find the requested percentiles for the given dataset of ages:
Dataset (ordered from smallest to largest):
a. Percentile of 37
b. Percentile of 72
Solution:
Index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
---|---|---|---|---|---|---|---|---|---|---|---|
Number |
18 |
18 |
21 |
22 |
25 |
26 |
27 |
29 |
30 |
31 |
31 |
Index |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
Number |
33 |
36 |
37 |
37 |
41 |
42 |
47 |
52 |
55 |
57 |
58 |
Index |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
|
Number |
62 |
64 |
67 |
69 |
71 |
72 |
73 |
74 |
76 |
77 |
a. Percentile of 37:
Count the number of values less than 37:
There are 15 values less than or equal to 37.
Divide this by the total number of values (\(n = 32\)): \(\dfrac{{15}}{{32}} \times 100 = 46.88\%\).
Therefore, the percentile of 37.0 inches is approximately 47%.
b. Percentile of 72:
Count the number of values less than 72:
There are 28 values less than or equal to 72.
Divide this by the total number of values (\(n = 32\)): \(\dfrac{{28}}{{32}} \times 100 = 87.50\%\).
Therefore, the percentile of 72.0 inches is approximately 88%.
Find the percentiles corresponding to the given lengths:
Dataset (ordered from smallest to largest):
a. Percentile of 61.0 in
b. Percentile of 47.0 in
c. Percentile of 70.0 in
d. Percentile of 58.0 in
Solution:
Index |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number |
36 |
37 |
40 |
40 |
41 |
43 |
43.5 |
46 |
46 |
47 |
48 |
48 |
49 |
50 |
Index |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
Number |
52 |
52.5 |
53 |
53 |
54 |
57.3 |
57.5 |
58 |
59 |
59 |
59 |
60 |
60.5 |
61 |
Index |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
Number |
61 |
61.5 |
62 |
63 |
63 |
63 |
63.5 |
64 |
64 |
64 |
65 |
65 |
66.5 |
67 |
Index |
43 |
44 |
45 |
46 |
47 |
48 |
49 |
50 |
51 |
52 |
53 |
54 |
||
Number |
67.5 |
68.5 |
70 |
70.5 |
72 |
72 |
72 |
72 |
73 |
73.5 |
75 |
76.5 |
a. Percentile of 61.0 in:
Count the number of values less than 61.0:
There are 29 values less than or equal to 61.0.
Divide this by the total number of values (\(n = 54\)): \(\dfrac{{29}}{{54}} \times 100 = 53.70 \approx 54\%\).
Therefore, the percentile of 61.0 inches is approximately 54%.
b. Percentile of 47.0 in:
Count the number of values less than 47.0:
There are 10 values less than or equal to 47.0.
Divide this by the total number of values (\(n = 54\)): \(\dfrac{{10}}{{54}} \times 100 = 17.86 \approx 18\%\).
Therefore, the percentile of 47.0 inches is approximately 18%.
c. Percentile of 70.0 in:
Count the number of values less than 70.0:
There are 45 values less than or equal to 70.0.
Divide this by the total number of values (\(n = 54\)): \(\dfrac{{45}}{{54}} \times 100 = 80.36 \approx 83\%\).
Therefore, the percentile of 70.0 inches is approximately 83%.
d. Percentile of 58.0 in:
Count the number of values less than 58.0:
There are 22 values less than or equal to 58.0.
Divide this by the total number of values (\(n = 54\)): \(\dfrac{{22}}{{54}} \times 100 = 39.29 \approx 41\%\).
Therefore, the percentile of 58.0 inches is approximately 41%.