Welcome back to Professor Baker's Math Class! As we progress through the summer semester, we are shifting gears from probability theory to Inferential Statistics. In Chapter 10, we focus on Estimation. Specifically, we look at how to take a single sample of data and use it to make an educated estimate about an entire population.
1. Point Estimators vs. Interval Estimators
There are two main ways to estimate a population parameter:
- Point Estimate: A single value used to approximate a population parameter. For example, we use the sample mean ($\bar{x}$) to estimate the population mean ($\mu$), and the sample proportion ($\hat{p}$) to estimate the population proportion ($p$).
- Interval Estimate (Confidence Interval): A range of values used to estimate the parameter with a specific level of confidence (e.g., 95% confidence). This gives us a margin of error.
2. Estimating the Population Mean ($\mu$)
When constructing a confidence interval for the mean, the formula changes based on what we know about the population's standard deviation ($\sigma$).
Scenario A: $\sigma$ is Known (The Z-Interval)
If we know the population standard deviation, or if our sample size is large enough ($n > 30$), we use the standard normal distribution (Z-distribution). The formula is:
$$ \bar{x} \pm z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right) $$Common critical values ($z_{\alpha/2}$) from our class notes include:
- 90% Confidence: $z = 1.645$
- 95% Confidence: $z = 1.96$
- 99% Confidence: $z = 2.575$
Scenario B: $\sigma$ is Unknown (The T-Interval)
In real-world statistics, we rarely know the true population standard deviation. Instead, we use the sample standard deviation ($s$). Because this introduces more uncertainty, we use the Student's t-distribution rather than the Z-distribution. This distribution is flatter and wider.
The formula requires us to calculate Degrees of Freedom ($df$), where $df = n - 1$.
$$ \bar{x} \pm t_{\alpha/2, n-1} \left( \frac{s}{\sqrt{n}} \right) $$3. Estimating Population Proportions ($p$)
Not all data is numerical; sometimes we are looking at categorical data (like "Yes/No" survey responses). In this case, we estimate the proportion of the population that fits a certain category. Ideally, for large samples ($np \geq 10$ and $n(1-p) \geq 10$), we use the Z-distribution:
$$ \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} $$Here, $\hat{p}$ is your sample success rate (e.g., 5 defective items out of 800 means $\hat{p} = 0.00625$).
4. Determining Sample Size
Before starting a study, you often need to know: "How many people do I need to survey to be accurate within a certain margin of error ($E$)?" We can rearrange our interval formulas to solve for $n$.
- For the Mean: $$ n = \left( \frac{z_{\alpha/2} \cdot \sigma}{E} \right)^2 $$
- For Proportions: $$ n = \frac{(z_{\alpha/2})^2 \cdot \hat{p}(1-\hat{p})}{E^2} $$
Remember, when calculating sample size, if you get a decimal, always round up to the next whole number (e.g., 384.16 becomes 385) to ensure you meet the confidence requirement.
Keep practicing those calculator inputs, and pay close attention to whether you are dealing with a mean (numbers) or a proportion (percentages)!