### A discussion of the "empirical rule" also known as the "68-95-99.7 rule"

Before proceeding you may wish to read the document describing the empirical rule (otherwise known as the 68-95-99.7 rule)

75 males were surveyed. Each reported his weight. You can link to the data set.

Here's a histogram.

Some basic summary statistics are provided here.

 Mean 163.72 Standard Deviation 30.73 Median 163 Lower Quartile 145.3 Upper Quartile 184 IQR 38.7 Minimum 64.5 Maximum 227.3 Range 162.8

Two rules of thumb.

1) Range/5 is in the ballpark of the standard deviation.

range/5 = 162.80/5 = 32.6. Close.

2) .75*IQR is in the ballpark of the standard deviation.

.75*IQR = .75*38.70 = 29.03. Close.

If you're familiar with the empirical rule then you know that when data has the appropriate shape histogram--called a normal shape (or mound shaped, or bell shaped), then

Approximately 68% of the data should lie within 1 standard deviation of the mean.

Notice that here, 51 of the 75 observations lie in such a range (from 132.99 to 194.45). You can see which ones by viewing the sorted data set--those that lie within this range are in black, those outside the range are red (they will be lighter if you have a printed document in black & white). That's exactly 68%!

Approximately 95% of the data should lie within 2 standard deviations of the mean.

Here its 70 of 75, which is 93.3%.

Almost all (technically 99.7%) of the data should lie within 3 standard deviations of the mean.

Here its 74 of 75, which is 98.7%.

Other statements are possible. For instance, about 16% of the data should lie more than 1 standard deviation above the mean.

Here its 14 of 75, which is 18.7%.

Half of the data should lie above the mean. That's equivalent to saying that the mean and median should roughly coincide. That's because this particular shape of histogram must be symmetric.

Here its 37 of 75 which is as close as one can get to half: 49.3%.

Most skewed distributions violate this rule in some way. Even symmetric distributions can violate the rule, because symmetric doesn't necessarily mean "normal/bell/mound" shaped. For an example, consider the data on starting salaries. Here you will find a larger concentration about the mean than is expected for a normal distribution. That's because the salary data, while symmetric, does not have exactly the normal/bell/mound shape.

Still, you would be surprised to know how many variables have a normal shape histogram. There's a technical reason why. In any case, there are other very important reasons why this shape is so important.