Salaries of 1995 Graduates of SUNY Oswego

There's a lot of information in this document. Click a topic from the menu below if you wish to proceed directly to a discussion of a particular topic.


Background

Source: "Follow up study of the senior class 1995," published by the SUNY-Oswego Career Services Office.

A 1994 SUNY Oswego publication details the field of study (major) and starting salary of that years' graduates. Note: Only those graduates who wished to respond to the survey provided data. While we may gain some insight, we may not extrapolate our insight to all 1994 graduates. We can reasonably extrapolate to only those graduates who would consent to reply to a survey.

Cases/individuals are those students who responded to the survey. Two variables are collected per case: Major, a categorical variable, ranging through all the possible majors, and Starting Salary, a quantitative variable. The data I examine here restricts the Major variable to but two of its many levels: Psychology and Computer Science. One might consider Major an explanatory variable and Starting Salary a response variable. It would be useful to predict the response when given the explanatory variable. Also, temporally (in time) a choice of major in many (but not all) aspects determines a starting salary. This relationship also helps identify the explanatory/response variable(s).


Stem Plots

Starting salaries of Psychology majors

18000 23000 30000 18000 20300 20000 18000 23000 16000 17000 13000 18000 32500 17000 15000 18680 18000 25000 30000 18000 20000 18000 48000 14000 08820 16700 17600 20000 16900 12500 17880 23000 26000 24000 18000 12000 23400 16600 23000 18500 19100 16900 27000 22000 10800 18000 18000 25000 20900 26000 20000 20000 37000 16500

Minimum = 8820 Maximum = 48000 Range = 39180. There are n = 54 observations/cases.

STEM LEAF
0 8
1 02234566666677778888888888889
2 0000000233333455667
3 0027
4 8

Stem unit = 10000 Leaf unit = 1000.

Note the underlined value of 8820. Each stem must occupy the same decimal place in the value. Because we're using the 10,000s place for the stem, the stem for the value 8820 is 0 (8820 = 08820; there are no=0 10000s in 8820). Assigning a stem of 8 to this value would (incorrectly) place it well above the others in the display.


You will increase the # of stems in a plot by splitting each stem into two; one with leaves 0, 1, 2, 3, 4 the other with leaves 5, 6, 7, 8, 9.

In our example, the stem "1" corresponds to 10000 ($). Take leaves 0 - 4 into a "lower stem," leaves 5 - 9 into an "upper stem."

STEM LEAF
0 8
1 02234
1 566666677778888888888889
2 00000002333334
2 55667
3 002
3 7
4  
4 8

Stem unit = 10000 Leaf unit = 1000.

The distribution is fairly symmetric with one outlier (approximately 48,000). The center of the distribution is somewhere near 20,000.


With relatively large data sets it's a good idea to split the stems in 5's. Here each stem is broken into 5, with leaves 0-1, 2-3, 4-5, 6-7 and 8-9.

STEM LEAF
0 8
1 0
1 223
1 45
1 6666667777
1 8888888888889
2 0000000
2 233333
2 455
2 667
2  
3 00
3 2
3  
3 7
3  
4  
4  
4  
4 8

Stem unit = 10000; Leaf unit = 1000.


Starting salaries of the 1995 Computer Science majors

26200 27113 22000 30000 27000 35000 30000 39000 34000 27000 35000 25000 30000 45000

Using back-to-back stem plots we can compare the salaries of the Psych majors with those of the Computer Science majors.

CS

  Psych

LEAF

STEM

LEAF

  0 8
  1 02234
  1 566666677778888888888889
2 2 00000002333334
77765 2 55667
4000 3 002
955 3 7
  4  
5 4 8

Stem unit = 10000; leaf unit = 1000.

The distribution of starting salaries for computer science majors (the "CS distribution") is centered to the right of (or above) the Psych distribution. The CS distribution has less spread than does the Psych distribution. There's one outlier in the CS distribution (about $45000); it's not as egregious as the $48000 is to the Psych distribution.

  • Both distributions are roughly symmetric.
  • There's one (high) outlier in each distribution. Perhaps these cases represent unusual circumstances. The $48000/year Psych major might be working in a rich father's company? On the other hand, perhaps this is an exceptional student being paid an appropriate amount?
  • In general CS majors earn a better starting salary than do Psych majors. Note: this statement requires the use of a term such as generally. Not all CS majors earn more than all Psych majors---if that we're the case the entire CS distribution would lie above the entire Psych distribution. Take "generally" to indicate a comparison of the centers of the two distributions.
  • There's less variability in starting salaries of CS majors than in starting salaries of Psych majors.

Histograms

A histogram of starting salaries for the Psychology majors is shown below. For details on its construction merely click it!


Summary Statistics

The median, quartiles, IQR, mean and standard deviation

Major n Mean St Dev Min Q1 Median Q3 Max IQR
Psych 54 20371 6506 8820 16975 18250 23000 48000 6025
C S 14 30880 6145 22000 26800 30000 35000 45000 8200

For details on the computing of these quantities, click the appropriate heading in the table.

It may be wise to return to various displays and see where these figures fit in. In particular, mean and median are measures of a distribution's center. IQR and standard deviation measure spread.


Boxplots

Below you see boxplots for the two majors.

One things we learn from the boxplot is that the difference in spread in attributable largely to the 3 relatively large salaries earned by psychology majors.

You should be able to approximate values of the median, two quartiles, maximum and minimum from a boxplot. Granted, it's easy enough to scroll up a bit and find the exact values; however, in comparing data sets looking at the display is guaranteed to make a better first impression. Examining the computer science majors:

Minimum 22500
First Quartile 26500
Median 30000
Third Quartile 35000
Maximum 45000

Then IQR = 35000 - 26500 = 8500. (These are all approximations, derived solely from the display.)

Note the difference in the height of the two boxes. The two boxes are drawn with height proportional to the square root of the number of observations. The size of the two sets of values are 54 and 14; square roots of these are 7.35 and 3.74, a ratio of 1.96 to 1 (1.96 = 7.35/3.74). So, the psychology boxplot is about 2 (exactly 1.96) times as high as is the computer science boxplot. The reliability of results derived through statistical inference goes with the square root of the sample size. (You may ignore this consideration.)