Lawyer's Salaries

Does this law firm discriminate on the basis of race?

You may view/copy the data.


Salaries of lawyers at a large firm. Information on three variables is included: salary (quantitative) in US $, tenure--or length of service--(quantitative) in months, and race (categorical). (Should you review the quantitative/categorical distinction?)

Given two quantitative (or numerical. . .same thing) variables per case, the appropriate plotting technique is the scatterplot. When a third variable, that is categorical, is also present, adjust your scatterplot using different symbols for each level of the categorical variable.

Notice that if the data on salary is aggregated over all the different values of tenure (that is, if we merely ignore the lenght of service) we get an extraordinarily misleading picture. The firm does pay blacks less than whites, but this is so because the blacks it employs have generally been with the firm for less time that have the whites. Averaging over the various tenures produces an assocation between race and salary that is not present when tenure is accounted.


Some questions and answers. The answers are in blue.

  1. How would you determine (from the graphs above only) the number of cases observed in this study?

    Count the number of points plotted in the scatterplot. (There's no way to tell from a boxplot.)
  2. How many variables are measured on each case? Give a name to each variable, then identify each as numerical or categorical.

    Three variables per case: Race (Categorical), Salary (Quantitative) & Tenure (Quantitative).
  3. How would your expect the mean to compare to the median for each race?

    Because of the extreme, or outlying, values to the right (high salaries), for whites we would expect the mean to far exceed the median. The boxplot for blacks indicates relative symmetry, and therefore we'd expect the mean and median to be roughly the same. That this is the case is seen below
      Mean Median
    Blacks 98049 97434
    Whites 186584 130371

     

  4. Do the two distributions (salary for each race) have equal standard deviation? If not, which race has a larger standard deviation? (Equal spread among groups is an assumption that many statistical inference techniques require the data satisfy.)

    The standard deviation is much larger for whites. The boxplots indicate much greater spread in the distribution for whites. (In fact, the standard deviations are 31923 for blacks and 157967 for whites.)
  5. Does it appear that race plays any role in determining salary?

    No. There is no evidence of this. However, clearly one or both of the following factors do contribute to discrepancies between the races:

    1. In the past the company did not hire blacks.

    2. In the past social discrimination led to few blacks becoming lawyers.

  6. Is there equal variability in salaries at differing experience levels?

    No. The spread in salaries gets larger as tenure increases.
  7. The company often hires experienced lawyers from other firms. These new hires are paid on the basis of their tenure with the previous firm (among other things). Predict the salary of a new hire who has 5 years of experience.

    About $80,000.
  8. Predict the salary of a new hire who has 20 years of experience. According to your predictions, how much more will this person earn than the new hire with 5 years experience? (Note that there's a difference of 15 years.)

    About $200,000. He will earn about $120,000 more.
  9. Predict the salary of a new hire who has 35 years of experience. According to your predictions, how much more will this person earn than the new hire with 20 years experience? (Note that there's again a difference of 15 years.)

    About $350,000. He will earn about $150,000 more.

If you examine your predicted salary differences of the previous two parts you will notice that there is an increase in the rate of change of salary as one's tenure increases. This is a feature of certain types of curves called exponential curves. These curves have the shape implied in the scatterplot above. (The value-over time-of money invested in a bank and the size-over time-of a population of organisms are two other phenomena that generally fit exponential curves.