
Normal Probability Plots
Detecting normality from a histogram is a difficult
job when data sets are not large. Here we view 4 large
data sets and corresponding "normal probability
plots." If we see how they relate when histograms
are easily described (because of the large amounts of
data) we can infer how they relate when histograms are
not so easy to parse (because of small amounts of data).
Normal Probability Plots
The basic premise is that the plot compares the data
with what would be expected of data that is perfectly
normally distributed. Then two quantities are compared:
The data and idealized normally distributed data. If the
two generally agree that means the data agrees with what
would be expected from a normal distribution. The normal
probability plot is then linear. Otherwise, the
plot will not be linear. Of course, no plot will be
exactly linear, because data is subject to randomness in
it's collection. We llok for a general pattern of
linearity.
Data Sampled From a Normal Distribution
Here's a histogram of 100 observations that were
randomly sampled from a normal distribution. Below the
histogram you see the normal probability plot of the data
(generated used the Normality Test in
the Stat menu in Minitab for Windows.)


Notice that the normal probability plot (NPP) is
basically straight. That's the idea: Normal data
= straight NPP. So, when the NPP is straight you have
evidence that the data is sampled from a normal
distribution.
Data Sampled From a Left Skewed Distribution


For left skewed data, the normal probability plot is
generally not straight. In general this sort of curvature
in the NPP evinces left skew.
Data Sampled From a Right Skewed Distribution


For right skewed data, the normal probability plot is
generally not straight. In general this sort of curvature
in the NPP evinces right skew.
Data Sampled From a Bimodal Distribution


For bimodal (two distinct peaks) data, the normal
probability plot is generally not straight. In general
this sort of curvature evinces bimodality.
Want to see how this works in making conclusions about the
normality of small samples?
|