Predicting GPA

  This page has features that work in Navigator 3.0 and up. The features may work in Explorer 4.0 (untested); they do not work in Explorer 3.0.

You may access the data set.

.Data was collected on 80 graduates of the area high school now attending college at one of three area schools, State U, City College and P Tech. Three variables were recorded per case

  1. College attended (categorical).
  2. High school (combined, verbal + math) SAT score (quantitative).
  3. College grade point average GPA (quantitative).

SAT might be used as a predictor of GPA, as a result we will treat SAT as an explanatory variable, with GPA as the response variable.

Here's a scatterplot.

There appears to be little or no association between these variables. The dotted line represents a least squares best fit to the data. This line indicates a slight negative association (not statistically significant). Perhaps you can identify two clusters?

We ignored information on one of the variables. . .college attended. Perhaps there's something important to be learned from that variable? College too could be considered an explanatory variable--different colleges have different standards for grading.

When there are two explanatory variables, one quantitative and one categorical, and a quantitative response variable, the proper plotting technique involves the use of distinct plotting symbols for the various levels of the categorical variable. Click the "Change" button and see what happens!

I've included best fit lines for each college. Of course this outlook dramatically changes things. Within each college we see a fairly strong, positive, linear association.

The slopes of the three lines are quite similar (any differences are not statistically significant). Each has slope approximately 0.0015. In applications, the slope of the line actually measures something! Here it measures the expected change in GPA when SAT is increased by 1. So, at each college, a student with a 1 point higher SAT score is expected to have a 0.0015 points higher GPA. Or. . .a student with a 100 point higher SAT score is expected to have a 0.15 points higher GPA. This result applies at any of the three colleges (in technical term, the variables "college attended" and SAT score do not interact).

However, the relative vertical locations of the lines do differ. (In mathematical terms this is to say that the lines have different intercepts.) Grades are generally tougher to come by at P Tech. State U appears to be a little tougher than is City College.

This example illustrates the proper plotting technique for a situation involving a quantitative response, and both a quantitative and categorical explanatory variable. However, contrasting the two plots above should lead you to a discovery. While not the most striking example, this data does illustrate Simpson's Paradox. Notice how ignoring information on one of the variables (college attended) leads to a mistaken conclusion regarding the association between the remaining variables. That's because of the ways in which the variables are associated. P Tech students have, on average, higher SAT scores. They also attend a college with much tougher grading standards.