Consider the histogram of 1000 observations shown below. You may view the data.
As you can see, this data is almost perfectly normally distributed. That's because I made it up that way! The data is 1000 observations from an ideal normal distribution. A couple descriptive values
|Mean||m = 100|
|Standard Deviation||s = 10|
16 observations are randomly sampled from this population. Which of the four histograms is the histogram of the random sample?
They all are! That's right; each of these represents a random sample of 16 observations from the population given in the histogram at top. In fact, if you go back to the data, you can match the random samples by color: Sample A (blue), Sample B (red), Sample C (pink), Sample D (green).
It's tough to judge (randomly sampled) books by their covers. That is: Just because a population has a normal distribution doesn't mean that a random sample drawn from the population will "look" (in a histogram) normal. Of course for large samples the representativeness is more accurate.
How then can we tell whether a sample is drawn from a normal population? If the sample is large -- just examining the histogram gives a good picture of things. But still, since the human eye is not evolved to detect "bell-shaped," it would be nice to have another tactic.
Use the normal probability plot. Linear = consistent with sampled from a normal distribution. The reason this plot is effective is that it makes use of what the human eye is good at detecting -- linearity.
Here are the normal probability plots for the 4 samples shown above.
They're all essentially linear! This indicates normality.
To determine whether a random sample of data is consistent with what is expected when sampling from a normal distribution use a normal probability plot. If the plot is linear then YES, the data is consistent with what is expected when sampling from a normal distribution.
This is particularly useful for small samples, where histograms don't have much visual presence.