Weights of Adult Males
A discussion of the "empirical
rule" also known as the "689599.7 rule"
Before proceeding you may wish to read
the document describing the empirical rule (otherwise known as the
689599.7 rule)
75 males were surveyed. Each reported his weight. You
can link to the data set.
Here's a histogram.
Some basic summary statistics are provided here.
Two rules of thumb.
1) Range/5 is in the ballpark of the standard
deviation.
range/5 = 162.80/5 = 32.6. Close.
2) .75*IQR is in the ballpark of the standard
deviation.
.75*IQR = .75*38.70 = 29.03. Close.
If you're familiar with the empirical rule then you
know that when data has the appropriate shape
histogramcalled a normal shape (or mound shaped, or
bell shaped), then
Approximately 68% of the data should lie within 1
standard deviation of the mean.
Notice that here, 51 of the 75 observations
lie in such a range (from 132.99 to 194.45). You
can see which ones by viewing the sorted
data setthose that lie within this range
are in black, those outside the range are red
(they will be lighter if you have a printed
document in black & white). That's exactly
68%!
Approximately 95% of the data should lie within 2
standard deviations of the mean.
Here its 70 of 75, which is 93.3%.
Almost all (technically 99.7%) of the data should
lie within 3 standard deviations of the mean.
Here its 74 of 75, which is 98.7%.
Other statements are possible. For instance, about
16% of the data should lie more than 1 standard
deviation above the mean.
Here its 14 of 75, which is 18.7%.
Half of the data should lie above the mean. That's
equivalent to saying that the mean and median should
roughly coincide. That's because this particular
shape of histogram must be symmetric.
Here its 37 of 75 which is as close as one can
get to half: 49.3%.
Most skewed distributions violate this rule in some
way. Even symmetric distributions can violate the rule,
because symmetric doesn't necessarily mean
"normal/bell/mound" shaped. For an example,
consider the data on starting
salaries. Here you will find a larger concentration
about the mean than is expected for a normal
distribution. That's because the salary data, while
symmetric, does not have exactly the normal/bell/mound
shape.
Still, you would be surprised to know how many
variables have a normal shape histogram. There's a
technical reason why. In any case, there are other very
important reasons why this shape is so important.
