Detecting Normality in Small Samples


Consider the histogram of 1000 observations shown below. You may view the data.

As you can see, this data is almost perfectly normally distributed. That's because I made it up that way! The data is 1000 observations from an ideal normal distribution. A couple descriptive values

Mean m = 100
Standard Deviation   s = 10

The challenge

16 observations are randomly sampled from this population. Which of the four histograms is the histogram of the random sample?

Sample A

Sample Mean 101.79
Sample Standard Deviation 11.26

Sample B

Sample Mean   97.77
Sample Standard Deviation 8.07

Sample C

Sample Mean   98.74
Sample Standard Deviation 7.82

Sample D

Sample Mean   99.83
Sample Standard Deviation 12.42

The Answer

They all are! That's right; each of these represents a random sample of 16 observations from the population given in the histogram at top. In fact, if you go back to the data, you can match the random samples by color: Sample A (blue), Sample B (red), Sample C (pink), Sample D (green).

The Point

It's tough to judge (randomly sampled) books by their covers. That is: Just because a population has a normal distribution doesn't mean that a random sample drawn from the population will "look" (in a histogram) normal. Of course for large samples the representativeness is more accurate.


The Question

How then can we tell whether a sample is drawn from a normal population? If the sample is large -- just examining the histogram gives a good picture of things. But still, since the human eye is not evolved to detect "bell-shaped," it would be nice to have another tactic.

The Solution

Use the normal probability plot. Linear = consistent with sampled from a normal distribution. The reason this plot is effective is that it makes use of what the human eye is good at detecting -- linearity.

Here are the normal probability plots for the 4 samples shown above.

They're all essentially linear! This indicates normality.


The Main Point

To determine whether a random sample of data is consistent with what is expected when sampling from a normal distribution use a normal probability plot. If the plot is linear then YES, the data is consistent with what is expected when sampling from a normal distribution.

This is particularly useful for small samples, where histograms don't have much visual presence.