6.5. Confidence Intervals for a Single Population Proportion#

Definition - Population Proportion and Sample Proportion

Suppose we have a population where each member either possesses or lacks a specific attribute. In this context, we use the following notation and terms:

  • Population proportion, p: This represents the proportion (percentage) of the entire population that possesses the specified attribute.

  • Sample proportion, \(\widehat{p}\): This refers to the proportion (percentage) of a sample taken from the population that exhibits the specified attribute.

6.5.1. Sample Proportion#

To calculate the sample proportion, denoted as \(\widehat{p}\), use the formula:

(6.7)#\[\begin{equation} \widehat{p} = \dfrac{x}{n} \end{equation}\]

Here, \(x\) represents the number of individuals in the sample who possess the specified attribute, and \(n\) denotes the sample size.

6.5.2. The Sampling Distribution of the Sample Proportion#

For samples of size \(n\):

  • The mean of \(\widehat{p}\) equals the population proportion: \(\mu_{\widehat{p}} = p\). This indicates that the sample proportion is an unbiased estimator of the population proportion.

  • The standard deviation of \(\widehat{p}\) is given by the square root of the product of the population proportion and one minus the population proportion, divided by the sample size:

(6.8)#\[\begin{equation} \sigma_{\widehat{p}} = \sqrt{\dfrac{p(1 - p)}{n}} \end{equation}\]

For large values of \(n\), and when \(np \geq 10\) and \(n(1-p) \geq 10\), the sample proportion \(\widehat{p}\) is approximately normally distributed.

This normal approximation allows us to construct confidence intervals and perform hypothesis tests using the concept of margin of error.

6.5.3. Estimating the Standard Error#

In practice, we often don’t know the true population proportion \(p\). When this is the case, we use the sample proportion \(\widehat{p}\) as an estimate. This leads to an estimate of the standard deviation of \(\widehat{p}\), which we call the standard error:

(6.9)#\[\begin{equation} SE_{\widehat{p}} = \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} \end{equation}\]

This estimated standard error is used in calculating the margin of error and constructing confidence intervals when the population proportion is unknown.

6.5.4. Margin of Error (E) for Estimating a Population Proportion#

The margin of error is a critical value in statistics that indicates the potential range of deviation between the sample proportion and the true population proportion. It is represented by \(E\) and is calculated as follows:

(6.10)#\[\begin{equation} E = z_{\alpha/2} \cdot \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} \end{equation}\]

Where:

  • \(z_{\alpha/2}\) is the critical z-value corresponding to the desired confidence level. Here, \(\alpha\) represents the significance level, which is 1 minus the confidence level.

  • \(\widehat{p}\) is the sample proportion.

  • \(n\) is the sample size.

When constructing a confidence interval for a population proportion, the interval estimate can be expressed as the range from \(\widehat{p} - E\) to \(\widehat{p} + E\).

For example, if we have a sample proportion \(\widehat{p}\) from a sample size \(n\) and we desire a confidence level of 95%, we can find the critical value \(z_{\alpha/2}\) (which is approximately 1.96 for 95% confidence) and then compute the margin of error \(E\). The confidence interval will then provide an estimated range for the true population proportion.

Fig. 6.28 visually represents the concept of the margin of error (E) in estimating the population proportion (\(p\)). It shows a range centered around the sample proportion (\(\widehat{p}\)), extending from \(\widehat{p} - E\) to \(\widehat{p} + E\). This range indicates the interval within which the true population proportion is likely to fall, considering the confidence level and sample size.

../_images/margin_of_error_figure3.png

Fig. 6.28 Visualizing the Margin of Error in Estimating Population Proportion.#

6.5.5. Procedure for Estimating a Population Proportion#

When estimating the population proportion (\(p\)) using a sample, a confidence interval can be constructed to determine the range within which the true population proportion lies with a specified level of confidence. The following algorithm outlines the steps involved in this estimation process.

Algorithm 6.3 (Estimating a Population Proportion)

Objective: To find a confidence interval for the population proportion, \(p\).

Prerequisites:

  1. Simple random sample.

  2. The number of successes, \(x\), and the number of failures, \(n - x\), are both 5 or greater. This condition ensures the sample size is large enough for the normal approximation to be valid.

Method:

  1. Determine the Critical Value: For a confidence level of \(1 - \alpha\), use the z-distribution table to find \(z_{\alpha/2}\).

  2. Calculate the Interval: The confidence interval for \(p\) is given by:

(6.11)#\[\begin{equation} \widehat{p} - z_{\alpha/2} \cdot \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} \quad \text{to} \quad \widehat{p} + z_{\alpha/2} \cdot \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} \end{equation}\]

Where:

  • \(\widehat{p}\) is the sample proportion, calculated as \(\widehat{p} = \frac{x}{n}\), where \(x\) is the number of successes and \(n\) is the sample size.

  • The term \(\widehat{p}(1 - \widehat{p})\) represents the sample variance of the proportion.

  • \(n\) is the sample size.

  • \(z_{\alpha/2}\) is the critical z-value obtained in Step 1.

This interval provides the range within which the true population proportion is likely to be found with the given level of confidence.

Note

The number of successes (\(x\)) and the number of failures (\(n - x\)) relate directly to the sample proportion (\(\widehat{p}\)) and its complement (\(1 - \widehat{p}\)). The sample proportion \(\widehat{p}\) is calculated as the ratio of successes to the total sample size (\(\widehat{p} = \frac{x}{n}\)). The complement of the sample proportion, \(1 - \widehat{p}\), represents the proportion of failures in the sample. These proportions are used to estimate the variability of the population proportion, which is then scaled by the critical value from the z-distribution to determine the confidence interval. The condition that both \(x\) and \(n - x\) are at least 5 ensures the normal approximation to the binomial distribution is appropriate, allowing for the use of the z-distribution in constructing the confidence interval.

Example 6.23

A city council wants to know the proportion of residents who support the construction of a new park. They conducted a survey of 500 randomly selected residents, and 320 of them express their support for the new park.

  • a. Determine the sample proportion (\(\widehat{p}\)) of residents who support the new park.

  • b. Choose a confidence level of 95%. Find the corresponding critical value (\(z_{\alpha/2}\)) from the z-distribution table.

  • c. Verify that the number of successes (\(x\)) and the number of failures (\(n - x\)) are both at least 5.

  • d. Calculate the standard error (SE) of the sample proportion, and then construct the 95% confidence interval for the population proportion (\(p\)).

  • e. Explain what the confidence interval means in the context of this problem.

Solution:

a. Calculate the Sample Proportion:

Given:

  • Number of successes (\(x\)) = 320

  • Sample size (\(n\)) = 500

\[\begin{equation*} \widehat{p} = \dfrac{x}{n} = \dfrac{320}{500} = 0.64 \end{equation*}\]

b. Determine the Critical Value:

For a 95% confidence level, \(\alpha = 0.05\) and \(\alpha/2 = 0.025\). The critical value (\(z_{\alpha/2}\)) from the z-distribution table is approximately 1.96.

c. Check Prerequisites:

  • Number of successes (\(x\)) = 320

  • Number of failures (\(n - x\)) = 500 - 320 = 180

Both values are greater than 5, so the normal approximation is valid.

d. Construct the Confidence Interval:

  • Standard Error (SE) of the sample proportion:

    \[\begin{equation*} SE = \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} = \sqrt{\dfrac{0.64 \cdot (1 - 0.64)}{500}} = \sqrt{\dfrac{0.64 \cdot 0.36}{500}} \approx 0.02147 \end{equation*}\]
  • Confidence Interval:

    \[\begin{equation*} \widehat{p} \pm z_{\alpha/2} \cdot SE = 0.64 \pm 1.96 \cdot 0.02147 \end{equation*}\]
    \[\begin{equation*} 0.64 \pm 0.04207 \quad \Rightarrow \quad [0.5979, 0.6821] \end{equation*}\]

e. Interpret the Results:

The 95% confidence interval for the proportion of residents who support the construction of the new park is approximately [0.5979, 0.6821]. This means we are 95% confident that the true population proportion of residents who support the park lies between 59.79% and 68.21%.

Fig. 6.29 provides a visual representation of the 95% confidence interval.

../_images/example_651.png

Fig. 6.29 95% Confidence Interval for Proportion of Residents Supporting the New Park#

Example 6.24

A university wants to know the proportion of students who use the campus gym regularly. They conduct a survey of 400 randomly selected students, and 260 of them report using the gym regularly.

  • a. Determine the sample proportion (\(\widehat{p}\)) of students who use the campus gym regularly.

  • b. Choose a confidence level of 95%. Find the corresponding critical value (\(z_{\alpha/2}\)) from the z-distribution table.

  • c. Verify that the number of successes (\(x\)) and the number of failures (\(n - x\)) are both at least 5.

  • d. Calculate the standard error (SE) of the sample proportion, and then construct the 95% confidence interval for the population proportion (\(p\)).

  • e. Explain what the confidence interval means in the context of this problem.

Solution:

a. Calculate the Sample Proportion:

Given:

  • Number of successes (\(x\)) = 260

  • Sample size (\(n\)) = 400

\[\begin{equation*} \widehat{p} = \dfrac{x}{n} = \dfrac{260}{400} = 0.65 \end{equation*}\]

b. Determine the Critical Value:

For a 95% confidence level, \(\alpha = 0.05\) and \(\alpha/2 = 0.025\). The critical value (\(z_{\alpha/2}\)) from the z-distribution table is approximately 1.96.

c. Check Prerequisites:

  • Number of successes (\(x\)) = 260

  • Number of failures (\(n - x\)) = 400 - 260 = 140

Both values are greater than 5, so the normal approximation is valid.

d. Construct the Confidence Interval:

  • Standard Error (SE) of the sample proportion:

    \[\begin{equation*} SE = \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} = \sqrt{\dfrac{0.65 \cdot (1 - 0.65)}{400}} = \sqrt{\dfrac{0.65 \cdot 0.35}{400}} \approx 0.0238 \end{equation*}\]
  • Confidence Interval:

    \[\begin{equation*} \widehat{p} \pm z_{\alpha/2} \cdot SE = 0.65 \pm 1.96 \cdot 0.02385 \end{equation*}\]
    \[\begin{equation*} 0.65 \pm 0.04674 \quad \Rightarrow \quad [0.6033, 0.6967] \end{equation*}\]

e. Interpret the Results:

The 95% confidence interval for the proportion of students who use the campus gym regularly is approximately [0.6033, 0.6967]. This means we are 95% confident that the true population proportion of students who use the gym regularly lies between 60.33% and 69.67%.

Fig. 6.30 provides a visual representation of the 95% confidence interval.

../_images/example_652.png

Fig. 6.30 95% Confidence Interval for Proportion of Students Using the Campus Gym Regularly#

Example 6.25

A school district wants to know the proportion of parents who are satisfied with the quality of education in the district. They survey 700 randomly selected parents, and 490 of them report being satisfied.

  • a. Determine the sample proportion (\(\widehat{p}\)) of parents who are satisfied with the quality of education.

  • b. Choose a confidence level of 95%. Find the corresponding critical value (\(z_{\alpha/2}\)) from the z-distribution table.

  • c. Verify that the number of successes (\(x\)) and the number of failures (\(n - x\)) are both at least 5.

  • d. Calculate the standard error (SE) of the sample proportion, and then construct the 95% confidence interval for the population proportion (\(p\)).

  • e. Explain what the confidence interval means in the context of this problem.

Solution:

a. Calculate the Sample Proportion:

Given:

  • Number of successes (\(x\)) = 490

  • Sample size (\(n\)) = 700

\[\begin{equation*} \widehat{p} = \dfrac{x}{n} = \dfrac{490}{700} = 0.7 \end{equation*}\]

b. Determine the Critical Value:

For a 95% confidence level, \(\alpha = 0.05\) and \(\alpha/2 = 0.025\). The critical value (\(z_{\alpha/2}\)) from the z-distribution table is approximately 1.96.

c. Check Prerequisites:

  • Number of successes (\(x\)) = 490

  • Number of failures (\(n - x\)) = 700 - 490 = 210

Both values are greater than 5, so the normal approximation is valid.

d. Construct the Confidence Interval:

  • Standard Error (SE) of the sample proportion:

    \[\begin{equation*} SE = \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} = \sqrt{\dfrac{0.7 \cdot (1 - 0.7)}{700}} = \sqrt{\dfrac{0.7 \cdot 0.3}{700}} \approx 0.0173 \end{equation*}\]
  • Confidence Interval:

    \[\begin{equation*} \widehat{p} \pm z_{\alpha/2} \cdot SE = 0.7 \pm 1.96 \cdot 0.01732 \end{equation*}\]
    \[\begin{equation*} 0.7 \pm 0.03395 \quad \Rightarrow \quad [0.6661, 0.7339] \end{equation*}\]

e. Interpret the Results:

The 95% confidence interval for the proportion of parents who are satisfied with the quality of education in the district is approximately [0.6661, 0.7339]. This means we are 95% confident that the true population proportion of satisfied parents lies between 66.61% and 73.39%.

Fig. 6.31 provides a visual representation of the 95% confidence interval.

../_images/example_653.png

Fig. 6.31 95% Confidence Interval for Proportion of Parents Satisfied with Education Quality#

Example 6.26

A retail store wants to know the proportion of customers who are satisfied with their service. They survey 350 randomly selected customers, and 245 of them report being satisfied.

  • a. Determine the sample proportion (\(\widehat{p}\)) of customers who are satisfied with the service.

  • b. Choose a confidence level of 95%. Find the corresponding critical value (\(z_{\alpha/2}\)) from the z-distribution table.

  • c. Verify that the number of successes (\(x\)) and the number of failures (\(n - x\)) are both at least 5.

  • d. Calculate the standard error (SE) of the sample proportion, and then construct the 95% confidence interval for the population proportion (\(p\)).

  • e. Explain what the confidence interval means in the context of this problem.

Solution:

a. Calculate the Sample Proportion:

Given:

  • Number of successes (\(x\)) = 245

  • Sample size (\(n\)) = 350

\[\begin{equation*} \widehat{p} = \dfrac{x}{n} = \dfrac{245}{350} = 0.7 \end{equation*}\]

b. Determine the Critical Value:

For a 95% confidence level, \(\alpha = 0.05\) and \(\alpha/2 = 0.025\). The critical value (\(z_{\alpha/2}\)) from the z-distribution table is approximately 1.96.

c. Check Prerequisites:

  • Number of successes (\(x\)) = 245

  • Number of failures (\(n - x\)) = 350 - 245 = 105

Both values are greater than 5, so the normal approximation is valid.

d. Construct the Confidence Interval:

  • Standard Error (SE) of the sample proportion:

    \[\begin{equation*} SE = \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} = \sqrt{\dfrac{0.7 \cdot (1 - 0.7)}{350}} = \sqrt{\dfrac{0.7 \cdot 0.3}{350}} \approx 0.0243 \end{equation*}\]
  • Confidence Interval:

    \[\begin{equation*} \widehat{p} \pm z_{\alpha/2} \cdot SE = 0.7 \pm 1.96 \cdot 0.0243 \end{equation*}\]
    \[\begin{equation*} 0.7 \pm 0.04801 \quad \Rightarrow \quad [0.652, 0.748] \end{equation*}\]

e. Interpret the Results:

The 95% confidence interval for the proportion of customers who are satisfied with the service is approximately [0.652, 0.748]. This means we are 95% confident that the true population proportion of satisfied customers lies between 65.2% and 74.8%.

Fig. 6.32 provides a visual representation of the 95% confidence interval.

../_images/example_654.png

Fig. 6.32 95% Confidence Interval for Proportion of Satisfied Customers#

Example 6.27

A university wants to know the proportion of students who are satisfied with the online learning system. They survey 400 randomly selected students, and 280 of them report being satisfied.

  • a. Determine the sample proportion (\(\widehat{p}\)) of students who are satisfied with the online learning system.

  • b. Choose a confidence level of 95%. Find the corresponding critical value (\(z_{\alpha/2}\)) from the z-distribution table.

  • c. Verify that the number of successes (\(x\)) and the number of failures (\(n - x\)) are both at least 5.

  • d. Calculate the standard error (SE) of the sample proportion, and then construct the 95% confidence interval for the population proportion (\(p\)).

  • e. Explain what the confidence interval means in the context of this problem.

Solution:

a. Calculate the Sample Proportion:

Given:

  • Number of successes (\(x\)) = 280

  • Sample size (\(n\)) = 400

\[\begin{equation*} \widehat{p} = \dfrac{x}{n} = \dfrac{280}{400} = 0.7 \end{equation*}\]

b. Determine the Critical Value:

For a 95% confidence level, \(\alpha = 0.05\) and \(\alpha/2 = 0.025\). The critical value (\(z_{\alpha/2}\)) from the z-distribution table is approximately 1.96.

c. Check Prerequisites:

  • Number of successes (\(x\)) = 280

  • Number of failures (\(n - x\)) = 400 - 280 = 120

Both values are greater than 5, so the normal approximation is valid.

d. Construct the Confidence Interval:

  • Standard Error (SE) of the sample proportion:

    \[\begin{equation*} SE = \sqrt{\dfrac{\widehat{p}(1 - \widehat{p})}{n}} = \sqrt{\dfrac{0.7 \cdot (1 - 0.7)}{400}} = \sqrt{\dfrac{0.7 \cdot 0.3}{400}} \approx 0.0229 \end{equation*}\]
  • Confidence Interval:

    \[\begin{equation*} \widehat{p} \pm z_{\alpha/2} \cdot SE = 0.7 \pm 1.96 \cdot 0.02291 \end{equation*}\]
    \[\begin{equation*} 0.7 \pm 0.04491 \quad \Rightarrow \quad [0.6551, 0.7449] \end{equation*}\]

e. Interpret the Results:

The 95% confidence interval for the proportion of students who are satisfied with the online learning system is approximately [0.6551, 0.7449]. This means we are 95% confident that the true population proportion of satisfied students lies between 65.51% and 74.49%.

Fig. 6.33 provides a visual representation of the 95% confidence interval.

../_images/example_655.png

Fig. 6.33 95% Confidence Interval for Proportion of Students Satisfied with Online Learning System#

6.5.6. Point Estimate of \(p\)#

The point estimate \(\widehat{p}\) is the average of the upper and lower limits of the confidence interval. It is given by:

(6.12)#\[\begin{equation} \widehat{p} = \dfrac{\left( \text{upper confidence interval limit} \right) + \left( \text{lower confidence interval limit} \right)}{2} \end{equation}\]

Margin of Error:

The margin of error \(\mathbf{E}\) is half the difference between the upper and lower limits of the confidence interval. It is calculated as:

(6.13)#\[\begin{equation} E = \dfrac{\left( \text{upper confidence interval limit} \right) - \left( \text{lower confidence interval limit} \right)}{2} \end{equation}\]

Example 6.28

Express the Confidence Interval in the form \(\widehat{p} \pm E\)

  • a. Brown Eyes: Express \(0.375 < p < 0.425\) in the form \(\widehat{p} \pm E\).

  • b. Blue Eyes: Express \(0.275 < p < 0.425\) in the form \(\widehat{p} \pm E\).

Solution:

The confidence intervals are based on proportions of eye colors.

a. Brown Eyes: Express \(0.375 < p < 0.425\) in the form \(\widehat{p} \pm E\).

  • Point Estimate \(\widehat{p}\):

\[\begin{equation*} \widehat{p} = \dfrac{0.425 + 0.375}{2} = \dfrac{0.8}{2} = 0.4 \end{equation*}\]
  • Margin of Error \(E\):

\[\begin{equation*} E = \dfrac{0.425 - 0.375}{2} = \dfrac{0.05}{2} = 0.025 \end{equation*}\]

Therefore, \(0.375 < p < 0.425\) can be expressed as \(\widehat{p} \pm E = 0.4 \pm 0.025\).

b. Blue Eyes: Express \(0.275 < p < 0.425\) in the form \(\widehat{p} \pm E\).

  • Point Estimate \(\widehat{p}\):

\[\begin{equation*} \widehat{p} = \dfrac{0.425 + 0.275}{2} = \dfrac{0.7}{2} = 0.35 \end{equation*}\]
  • Margin of Error \(E\):

\[\begin{equation*} E = \dfrac{0.425 - 0.275}{2} = \dfrac{0.15}{2} = 0.075 \end{equation*}\]

Therefore, \(0.275 < p < 0.425\) can be expressed as \(\widehat{p} \pm E = 0.35 \pm 0.075\).

6.5.7. Estimating Sample Size for Population Proportion#

When conducting a survey or study, it’s crucial to determine the right sample size to ensure that the results are representative of the population. This section outlines the statistical methods used to calculate the sample size needed to estimate a population proportion with a given level of confidence and precision.

Estimating Sample Size for Population Proportion

Objective: To calculate the sample size \(n\) required for estimating the population proportion \(p\) with a desired level of accuracy.

Key Assumption: The data is collected from a simple random sample where each sample unit is selected independently.

Formula When \(\widehat{p}\) is Known: If we have a preliminary estimate of the population proportion \(\widehat{p}\), we use the following formula:

(6.14)#\[\begin{equation} n = \dfrac{\left( z_{\alpha/2} \right)^{2} \cdot \widehat{p} \cdot \widehat{q}}{E^{2}} \end{equation}\]

Here, \(\widehat{q}\) is the complement of \(\widehat{p}\) (i.e., \(\widehat{q} = 1 - \widehat{p}\)). The term \(z_{\alpha/2}\) represents the z-score corresponding to the desired confidence level, and \(E\) is the margin of error we are willing to accept.

Formula When \(\widehat{p}\) is Unknown: In cases where we do not have an estimate for \(\widehat{p}\), it’s conservative to assume that \(\widehat{p}\) is 0.5, as this maximizes the variance and thus the required sample size:

(6.15)#\[\begin{equation} n = \dfrac{\left( z_{\alpha/2} \right)^{2} \cdot 0.25}{E^{2}} \end{equation}\]

Rounding Up: The calculated sample size \(n\) should be rounded up to the nearest whole number to ensure the sample is large enough to estimate \(p\) accurately.

Interpretation: The larger the desired confidence level or the smaller the margin of error, the larger the sample size needed. This ensures that the estimated population proportion \(p\) is within the margin of error \(E\) of the true population proportion with the specified level of confidence.

Example 6.29

Find the sample size needed to estimate the percentage of California residents who are left-handed. Use a margin of error of three percentage points and use a confidence level of 99%.

  • a. Assume that \(\widehat{p}\) and \(\widehat{q}\) are unknown.

  • b. Assume that based on prior studies, about 10% of Californians are left-handed.

  • c. Does the additional survey information from part (b) have much of an effect on the sample size that is required?

Solution:

a. When \(\widehat{p}\) and \(\widehat{q}\) are unknown:

  • Margin of error \(E = 0.03\)

  • Confidence level = 99%, so \(\alpha = 0.01\) and \(z_{\alpha/2} \approx 2.576\)

\[\begin{equation*}n = \dfrac{\left( 2.576 \right)^{2} \cdot 0.25}{0.03^{2}} = \dfrac{6.635776 \cdot 0.25}{0.0009} = \dfrac{1.658944}{0.0009} \approx 1844\end{equation*}\]

Rounding up to the nearest whole number, \(n \approx 1844\).

b. When \(\widehat{p} = 0.10\) and \(\widehat{q} = 0.90\):

\[\begin{equation*} n = \dfrac{\left( 2.576 \right)^{2} \cdot 0.10 \cdot 0.90}{0.03^{2}} = \dfrac{6.635776 \cdot 0.09}{0.0009} = \dfrac{0.59721984}{0.0009} \approx 664 \end{equation*}\]

Rounding up to the nearest whole number, \(n \approx 664\).

c. Effect of Additional Survey Information:

Yes, the additional information significantly reduces the required sample size. Without an estimate, the sample size is 1844, but with prior information, it reduces to 664, which is a substantial decrease.