Inference About a Population Mean


Denote by m the mean for some population. In applications a random sample is selected from that population. Our objectives are among the following:

  1. To examine the data and note properties of the observed distribution, taking particular care to identify and justify outlying values.

  2. To estimate the population mean with a confidence interval.

  3. To test hypotheses regarding the population mean.


Examining the data.

If the data set is rather small, it is sufficient to simply note the values. Of course, with small data sets, we won't get a feel for much of the inherent variability of the data, so identifying true outliers is tough.

A histogram is the preferred graphical device. With smallish data sets, histograms will tend to be sparse and uninformative. Be careful in drawing specific conclusions from histograms based on small data sets: the particular features are quite likely to be due to the particular sample that's observed as well as a computer's choice of classes (bins).


Statistical inference about m

When is inference appropriate?

Small sample confidence intervals and P-values are only valid when the sampled population is approximately normally distributed. (The key word there is population; note that whatever information we have is generally nothing more than a very small subset of observations drawn from the population -- i.e. the sample.) The best way to assess whether or not this assumption is reasonable is through the normal probability plot. (Link to a short discussion of normal probability plots.) Use software to obtain such a plot. If your assessment of the plot is that a line is not the simplest fit for the plot, then the data is inconsistent with what would be expected when randomly sampling from a normal distribution.

In all cases, the results are "valid" only if the sample is, or is equivalent to, a random sample.

For large samples, it is not necessary that the population be normally distributed in order to correctly interpret CIs and P-values.

Fundamental quantities

Parameter    Estimate of parameter    Standard Error (SE) of estimate
m

Confidence interval

A (1- a ) CI for m is given by

.

t refers to the appropriate tabled value for a t-distribution with (n - 1) degrees of freedom.

Testing

To test H0: m = m0 the test statistic is

.

Find the P-value in the usual fashion, using the t distribution with (n - 1) degrees of freedom.

Large sample vs. small sample

No difference. Textbooks make it seem so: NOT TRUE. The only distinction is addressed above: Neither the CI nor the P-value are correct if the sample is small and has been drawn from a distribution that is not approximately normal.