Simpson's Paradox Worksheet

 

At many large universities there is an independent student organization that rates the faculty and publishes these ratings in a book that all students can purchase. Last year there were 4 professors teaching Intro Stats at State U: Drs. Arnold, Murphy, Ryan and Shafer. Each was rated on the GOOD FAIR POOR scale. The organization that does the ratings knows full well that many students have trouble in such a course because of a dislike for anything remotely resembling mathematics. Just for kicks (and hopefully to make some interesting conclusions) the rating form also asks each student to answer the question: Are you a good math student? Possible anwers are YES and NO. Here are the results.

All Students

 

QUALITY OF INSTRUCTION

 
Professor GOOD FAIR POOR Totals
Ryan 41 21 20 82
Arnold 48 18 15 81
Murphy 43 17 21 81
Shafer 43 17 18 78
Totals 175 73 74 322

Students Good at Math

 

QUALITY OF INSTRUCTION

 
Professor GOOD FAIR POOR Totals
Ryan 25 19 18 62
Arnold 6 8 7 21
Murphy 23 8 10 41
Shafer 7 15 15 37
Totals 61 50 50 161

Students Not Good at Math

 

QUALITY OF INSTRUCTION

 
Professor GOOD FAIR POOR Totals
Ryan 16 2 2 20
Arnold 42 10 8 60
Murphy 20 9 11 20
Shafer 36 2 3 41
Totals 114 23 24 161

General questions

  1. How many cases are there?
  2. How many variables are measured per case? What are they? Which are quantitative variables? Which are categorical?
  3. Which of the variables are response variables? Which of them are explanatory variables? (Think of a student who buys the publication. What is the intended predictive use?)
  4. What percentage of students are good at math?

Before proceeding, convince yourself that "All Students" Table is formed by summing the counts in the other two tables.

Now, analyze each table separately. If you are working with others, divide labors.

Specific questions for the analysis of the each of the individual tables.

  1. Find the marginal distribution for the variable "quality of instruction." Draw a bar graph. What information does this tell you?
  2. For each professor, find the conditional distribution for the variable "quality." Construct a segmented bar chart to simultaneously display these conditional distributions.
  3. Make some conclusions!
    a)  What similarities are there?
    b)  Rank the four professors. If there are roughly similar conditional distributions you may "tie" them.
    c)  In a choice between Murphy and Schaefer, who is preferred?
    d)  In a choice between Ryan and Arnold, who is preferred?

Compare results.

  1. Examine your answers to 3c for the Good and Not Good math students. Is there a conclusion to be made here? What is the answer to 3c for all students? Explain how the combined results arise from the tables that are separated by math ability.
  2. Examine you answers to 3d for the Good and Not Good math students. Is there a conclusion to be made here? What is the answer to 3d for all students? Can you explain this?