Salaries of 1995 Graduates of SUNY Oswego
There's a lot of information in this document. Click a topic from the menu below if you wish to proceed directly to a discussion of a particular topic.
Source: "Follow up study of the senior class 1995," published by the SUNY-Oswego Career Services Office.
A 1994 SUNY Oswego publication details the field of study (major) and starting salary of that years' graduates. Note: Only those graduates who wished to respond to the survey provided data. While we may gain some insight, we may not extrapolate our insight to all 1994 graduates. We can reasonably extrapolate to only those graduates who would consent to reply to a survey.
Cases/individuals are those students who responded to the survey. Two variables are collected per case: Major, a categorical variable, ranging through all the possible majors, and Starting Salary, a quantitative variable. The data I examine here restricts the Major variable to but two of its many levels: Psychology and Computer Science. One might consider Major an explanatory variable and Starting Salary a response variable. It would be useful to predict the response when given the explanatory variable. Also, temporally (in time) a choice of major in many (but not all) aspects determines a starting salary. This relationship also helps identify the explanatory/response variable(s).
Starting salaries of Psychology majors
18000 23000 30000 18000 20300 20000 18000 23000 16000 17000 13000 18000 32500 17000 15000 18680 18000 25000 30000 18000 20000 18000 48000 14000 08820 16700 17600 20000 16900 12500 17880 23000 26000 24000 18000 12000 23400 16600 23000 18500 19100 16900 27000 22000 10800 18000 18000 25000 20900 26000 20000 20000 37000 16500
Minimum = 8820 Maximum = 48000 Range = 39180. There are n = 54 observations/cases.
Stem unit = 10000 Leaf unit = 1000.
Note the underlined value of 8820. Each stem must occupy the same decimal place in the value. Because we're using the 10,000s place for the stem, the stem for the value 8820 is 0 (8820 = 08820; there are no=0 10000s in 8820). Assigning a stem of 8 to this value would (incorrectly) place it well above the others in the display.
You will increase the # of stems in a plot by splitting each stem into two; one with leaves 0, 1, 2, 3, 4 the other with leaves 5, 6, 7, 8, 9.
In our example, the stem "1" corresponds to 10000 ($). Take leaves 0 - 4 into a "lower stem," leaves 5 - 9 into an "upper stem."
Stem unit = 10000 Leaf unit = 1000.
The distribution is fairly symmetric with one outlier (approximately 48,000). The center of the distribution is somewhere near 20,000.
With relatively large data sets it's a good idea to split the stems in 5's. Here each stem is broken into 5, with leaves 0-1, 2-3, 4-5, 6-7 and 8-9.
Stem unit = 10000; Leaf unit = 1000.
Starting salaries of the 1995 Computer Science majors
26200 27113 22000 30000 27000 35000 30000 39000 34000 27000 35000 25000 30000 45000
Using back-to-back stem plots we can compare the salaries of the Psych majors with those of the Computer Science majors.
Stem unit = 10000; leaf unit = 1000.
The distribution of starting salaries for computer science majors (the "CS distribution") is centered to the right of (or above) the Psych distribution. The CS distribution has less spread than does the Psych distribution. There's one outlier in the CS distribution (about $45000); it's not as egregious as the $48000 is to the Psych distribution.
A histogram of starting salaries for the Psychology majors is shown below. For details on its construction merely click it!
The median, quartiles, IQR, mean and standard deviation
For details on the computing of these quantities, click the appropriate heading in the table.
It may be wise to return to various displays and see where these figures fit in. In particular, mean and median are measures of a distribution's center. IQR and standard deviation measure spread.
Below you see boxplots for the two majors.
One things we learn from the boxplot is that the difference in spread in attributable largely to the 3 relatively large salaries earned by psychology majors.
You should be able to approximate values of the median, two quartiles, maximum and minimum from a boxplot. Granted, it's easy enough to scroll up a bit and find the exact values; however, in comparing data sets looking at the display is guaranteed to make a better first impression. Examining the computer science majors:
Then IQR = 35000 - 26500 = 8500. (These are all approximations, derived solely from the display.)
Note the difference in the height of the two boxes. The two boxes are drawn with height proportional to the square root of the number of observations. The size of the two sets of values are 54 and 14; square roots of these are 7.35 and 3.74, a ratio of 1.96 to 1 (1.96 = 7.35/3.74). So, the psychology boxplot is about 2 (exactly 1.96) times as high as is the computer science boxplot. The reliability of results derived through statistical inference goes with the square root of the sample size. (You may ignore this consideration.)