6.4. Confidence Intervals for a Single Population Mean When \(\sigma\) is Unknown#

6.4.1. Margin of Error (E) for Estimating \(\mu\)#

The margin of error is a critical value in statistics that indicates the potential range of deviation between the sample statistic and the true population parameter. It is represented by \(E\) and is calculated as follows:

(6.5)#\[\begin{equation} E = t_{\alpha/2} \cdot \dfrac{s}{\sqrt{n}} \end{equation}\]

Where:

  • \(t_{\alpha/2}\) is the critical z-value corresponding to the desired confidence level.

  • \(\sigma\) is the unknown standard deviation of the population.

  • \(n\) is the sample size.

When the sample comes from a normal distribution with a known variance, the interval estimate for \(\mu\) can be expressed as the range from \(\overline{x} - E\) to \(\overline{x} + E\), where \(\overline{x}\) is the sample mean.

Fig. 6.24 visually represents the concept of the margin of error (E) in estimating the population mean (\(\mu\)). It shows a range centered around the sample mean (\(\overline{x}\)), extending from \(\overline{x} - E\) to \(\overline{x} + E\). This range indicates the interval within which the true population mean is likely to fall, considering the confidence level and sample size.

../_images/margin_of_error_figure2.png

Fig. 6.24 Visualizing the Margin of Error in Estimating Population Mean when \(\sigma\) is unknown.#

6.4.2. Procedure for Estimating a Population Mean with Unknown Standard Deviation#

When estimating the population mean (\(\mu\)) using a sample, and the population standard deviation (\(\sigma\)) is unknown, a confidence interval can be constructed using the t-distribution. This procedure ensures that the range within which the true population mean lies is determined with a specified level of confidence. The following algorithm outlines the steps involved in this estimation process.

Algorithm 6.2 (Estimating a Population Mean with Unknown Standard Deviation)

Objective: To find a confidence interval for the population mean, \(\mu\).

Prerequisites:

  1. Simple random sample.

  2. The population follows a normal distribution or the sample size is large.

  3. The population standard deviation (\(\sigma\)) is unknown.

Method:

  1. Determine the Critical Value: For a confidence level of \(1 - \alpha\), use the t-distribution table to find \(t_{\alpha/2}\) with degrees of freedom (df) = \(n - 1\), where \(n\) is the sample size.

  2. Calculate the Interval: The confidence interval for \(\mu\) is given by:

(6.6)#\[\begin{equation} \overline{x} - t_{t_{\alpha/2}} \cdot \dfrac{s}{\sqrt{n}} \quad \text{to} \quad \overline{x} + t_{t_{\alpha/2}} \cdot \dfrac{s}{\sqrt{n}} \end{equation}\]

Where:

  • \(\overline{x}\) is the sample mean.

  • \(s\) is the sample standard deviation.

  • \(n\) is the sample size.

  • \(t_{\alpha/2}\) is the critical t-value obtained in Step 1.

This interval provides the range within which the true population mean is likely to be found with the given level of confidence.

Example 6.20

According to a study, the average time spent per day on social media in 2015 was 1 hour and 30 minutes. For last year, a random sample of 25 Canadian adults spent the following number of hours per day on social media:

Sample Data:

0.5, 2.0, 3.0, 1.5, 2.5, 1.0, 3.5, 1.2, 4.0, 3.8, 0.8, 2.9, 3.7, 1.8, 1.3, 2.2, 2.6, 1.7, 1.9, 3.2

Find and interpret a 90% confidence interval for last year’s mean time spent per day on social media by Canadian adults. (Note: The sample mean \(\overline{x}\) = 2.24 hr and the sample standard deviation s = 1.05 hr.)

Solution: Given:

  • \(\overline{x} = 2.24\) hr

  • \(s = 1.05\) hr

  • \(n = 25\)

We can calculate the 90% confidence interval for the mean. The degrees of freedom (df) would be \(25 - 1 = 24\). For a 90% confidence interval and 24 degrees of freedom, the t-score (which can be found using a t-distribution table or calculator) is approximately 1.711.

Plugging the values into the formula:

\[\begin{align*} CI &= 2.24 \pm 1.711 \left(\dfrac{1.05}{\sqrt{25}}\right) \\ CI &= 2.24 \pm 1.711 \left(\dfrac{1.05}{5}\right) \\ CI &= 2.24 \pm 1.711 \times 0.21 \\ CI &= 2.24 \pm 0.35931 \end{align*}\]

So the 90% confidence interval is:

\[\begin{align*} CI &= [2.24 - 0.35929, 2.24 + 0.35929] \\ CI &= [1.88071, 2.59929] \end{align*}\]

Interpretation: We are 90% confident that the true mean time spent per day on social media by Canadian adults last year is between 1.88071 hours and 2.59929 hours. This means that if we were to take many samples and calculate the confidence interval for each, about 90% of those intervals would contain the true population mean.

Fig. 6.25 provides a visual representation of the 90% confidence interval.

../_images/example_641.png

Fig. 6.25 90% Confidence Interval for Mean Time Spent on Social Media#

Example 6.21

According to a survey, the average time spent per day reading books in 2018 was 20 minutes. For last year, a random sample of 30 Canadian adults spent the following number of minutes per day reading books:

10, 35, 50, 25, 40, 15, 55, 20, 60, 45, 5, 30, 52, 22, 12, 32, 38, 18, 28, 47, 24, 41, 16, 33, 39

Find and interpret a 95% confidence interval for last year’s mean time spent per day reading books by Canadian adults. (Note: The sample mean \(\overline{x}\) = 31 min and the sample standard deviation s = 15 min.)

Solution: Given:

  • \(\overline{x} = 31\) min

  • \(s = 15\) min

  • \(n = 30\)

We can calculate the 95% confidence interval for the mean. The degrees of freedom (df) would be \(30 - 1 = 29\). For a 95% confidence interval and 29 degrees of freedom, the t-score (which can be found using a t-distribution table or calculator) is approximately 2.045.

Plugging the values into the formula:

\[\begin{align*} CI &= 31 \pm 2.045 \left(\dfrac{15}{\sqrt{30}}\right) \\ CI &= 31 \pm 2.045 \left(\dfrac{15}{5.477}\right) \\ CI &= 31 \pm 2.045 \times 2.739 \\ CI &= 31 \pm 5.602 \end{align*}\]

So the 95% confidence interval is:

\[\begin{align*} CI &= [31 - 5.601, 31 + 5.601] \\ CI &= [25.399, 36.601] \end{align*}\]

Interpretation: We are 95% confident that the true mean time spent per day reading books by Canadian adults last year is between 25.399 minutes and 36.601 minutes. This means that if we were to take many samples and calculate the confidence interval for each, about 95% of those intervals would contain the true population mean.

Fig. 6.26 provides a visual representation of the 95% confidence interval.

../_images/example_642.png

Fig. 6.26 95% Confidence Interval for Mean Time Spent Reading Books#

Example 6.22

In a recent study, researchers found that the average time spent per day on exercise in 2019 was 45 minutes. For last year, a random sample of 20 Canadian adults reported the following number of minutes per day they dedicated to exercise:

30, 60, 70, 40, 55, 35, 75, 25, 80, 65, 20, 50, 72, 38, 48, 45, 68, 33, 52, 58

Calculate and interpret a 95% confidence interval for last year’s mean time spent per day on exercise by Canadian adults. (Note: The sample mean \(\overline{x}\) = 52 min and the sample standard deviation s = 18 min.)

Solution: Given:

  • \(\overline{x} = 52\) min

  • \(s = 18\) min

  • \(n = 20\)

We can calculate the 95% confidence interval for the mean. The degrees of freedom (df) would be \(20 - 1 = 19\). For a 95% confidence interval and 19 degrees of freedom, the t-score (which can be found using a t-distribution table or calculator) is approximately 2.093.

Plugging the values into the formula:

\[\begin{align*} CI &= 52 \pm 2.093 \left(\dfrac{18}{\sqrt{20}}\right) \\ CI &= 52 \pm 2.093 \left(\dfrac{18}{4.472}\right) \\ CI &= 52 \pm 2.093 \times 4.024 \\ CI &= 52 \pm 8.418 \end{align*}\]

So the 95% confidence interval is:

\[\begin{align*} CI &= [52 - 8.42426, 52 + 8.42426] \\ CI &= [43.57574, 60.42426] \end{align*}\]

Interpretation: We are 95% confident that the true mean time spent per day on exercise by Canadian adults last year is between 43.576 minutes and 60.424 minutes. This indicates that if we were to take many samples and compute the confidence interval for each, about 95% of those intervals would contain the true population mean.

Fig. 6.27 provides a visual representation of the 95% confidence interval.

../_images/example_643.png

Fig. 6.27 95% Confidence Interval for Mean Time Spent on Exercise#

6.4.3. Choosing Between Student t and z (Normal) Distributions#

When estimating population parameters such as the mean, the choice between using the Student t distribution and the z (normal) distribution depends on certain conditions. The table below summarizes the suitable methods for estimating the margin of error in confidence intervals, based on factors such as knowledge of the population standard deviation (\(\sigma\)), sample size (\(n\)), and the distribution of the population.

Table 6.2 outlines scenarios for choosing between the t and z distributions in estimating the margin of error under different conditions.

Table 6.2 Methods for Estimating the Margin of Error Under Different Conditions#

Conditions

Method

\(\sigma\) unknown and population is normally distributed

Use Student t distribution. \(E = t_{\alpha/2} \cdot \dfrac{s}{\sqrt{n}}\)

\(\sigma\) unknown and \(n > 30\)

Use Student t distribution. \(E = t_{\alpha/2} \cdot \dfrac{s}{\sqrt{n}}\)

\(\sigma\) known and population is normally distributed

Use normal (z) distribution. \(E = z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}\)

\(\sigma\) known and \(n > 30\) (though \(\sigma\) is rarely known)

Use normal (z) distribution. \(E = z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}\)

Population not normally distributed and \(n \leq 30\)

Use the bootstrapping method or a nonparametric method.

Explanation of Each Scenario:

  1. When \(\sigma\) is unknown and the population is normally distributed:

    • Use the Student t distribution, regardless of sample size.

    • Calculate the margin of error as \(E = t_{\alpha/2} \cdot \dfrac{s}{\sqrt{n}}\), where \(s\) is the sample standard deviation.

  2. When \(\sigma\) is unknown and the sample size is large (\(n > 30\)):

    • Use the Student t distribution.

    • The margin of error formula remains \(E = t_{\alpha/2} \cdot \dfrac{s}{\sqrt{n}}\).

  3. When \(\sigma\) is known and the population is normally distributed:

    • Use the normal (\(z\)) distribution.

    • The margin of error is calculated as \(E = z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}\).

  4. When \(\sigma\) is known and the sample size is large (\(n > 30\)):

    • Use the normal (\(z\)) distribution.

    • The margin of error formula remains \(E = z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}\), though it is noted that in practice, \(\sigma\) is rarely known.

  5. When the population is not normally distributed and the sample size is small (\(n \leq 30\)):

    • Alternative methods like bootstrapping or nonparametric techniques are recommended.

Key Points

  1. Student t distribution: Preferred when the population standard deviation is unknown, which is commonly the case in real-world studies.

  2. Large Sample Size (\(n > 30\)): The central limit theorem justifies assuming a normal distribution for large samples; however, the t distribution is recommended if \(\sigma\) is unknown.

  3. Normal (\(z\)) distribution: Used when \(\sigma\) is known, though this scenario is rare in practice.

  4. Small, Non-Normal Samples: When sample sizes are small and the population is non-normal, advanced methods like bootstrapping or nonparametric approaches should be used.

  5. Margin of Error Calculations: The formulas provided calculate the margin of error (\(E\)) for confidence intervals, where \(\alpha/2\) represents the tail probability corresponding to the desired confidence level.