Longevity of Syracusans

  You may take a look at and/or download the Longevity data set

I collected this data. Between Jan. 6 and Jan.15 of 1996 I read the obituary pages of the Syracuse Post Standard each day. For every obituary I recorded the gender (Male/Female) and age at death. The cases/individuals are the 238 recently departed people whose obituaries were recorded by me. Two variables are measured per case: Gender (Categorical) and Age at Death (Quantitative). Gender is an explanatory variable, age at death a response variable.


Let's compare these two histograms. The Female distribution in particular is right skewed. The Male distribution has what might be considered an outlier: a check on the original source data reveals this to be an accurate measurement. Outliers that "belong" are part of the distribution. Remember, this is only 238 people. I'm willing to bet that more data will "fill in" the 30 - 35 slot for the men. Often in statistics prior information and subjective judgments affect the nature of the analysis.

Still, the distribution for females appears to have a center that is to the right of (above that of) the males. Summary statistics bear this out.

Variable    N      N*   Mean   Median   Tr Mean   StDev   SE Mean
Males     104      6   73.45    75.00     73.97   12.86      1.26

Variable  Min    Max      Q1       Q3
Males   27.00  97.00   66.00    83.00
Variable    N     N*    Mean   Median   Tr Mean   StDev   SE Mean
Females   122      6   77.05    80.00     78.16   15.44      1.40

Variable  Min    Max      Q1       Q3
Females 23.00 100.00   69.00    88.00

This summary was generated in the statistical software package Minitab. N is the number of observations for which values are recorded; N* is the number of missing data entries (a total of 12 obituaries supplied no age information). The Mean, Median, StDev (Standard Deviation), Min (Minimum), Max (Maximum), Q1 (Lower Quartile) and Q3 (Upper Quartile) should be self-explanatory. A brief primer on the trimmed mean (Tr Mean) is available at this site!

The standard error of the mean. SE Mean, will be discussed when we cover sampling distributions. It is a measure of the variability in the mean when the mean is thought of as varying from sample to sample to sample. Clearly taking another sample---getting more information from the newspaper---would not result in identical results to these. How much the mean varies from sample to sample is of extreme importance in inferential statistics.

Finally, to cover the displays completely, here's side-by-side boxplots of these two distributions.

The values of the median, quartiles, min and max are readily obtained from this plot alone. From these one can get the IQR.

Boxplots are often constructed with width proportional to sample size. Since there are more females than males in this data set, the box for the females is a bit wider.