# Constructing a Histogram

Here is the data on starting salaries of 1995 Psychology graduates. When constructing a histogram it is helpful to sort the observations.

08820 10800 12000 12500 13000 14000 15000 16000 16500 16600 16700 16900 16900 17000 17000 17600 17880 18000 18000 18000 18000 18000 18000 18000 18000 18000 18000 18500 18680 19100 20000 20000 20000 20000 20000 20300 20900 22000 23000 23000 23000 23000 23400 24000 25000 25000 26000 26000 27000 30000 30000 32500 37000 48000

Minimum = 8820 Maximum = 48000 Range = 39180.

• To begin, decide how many intervals you would like. A good rule of thumb is to use the square root of the number of observations (after rounding). Here, that is the square root of 54 = 7.34; round up and use 8.
• The interval width should then be approximately equal to the range divided by the number of intervals. Range/# Intervals = 39180/8 = 4897.5; I'll round up to the conveniently even figure of 5000. (It is quite helpful to use a round number.)
• Start the first interval at a convenient value below the minimum. Here the minimum is 8820, so begin at 7500 (other choices are equally acceptable).
• The intervals then begin at 7500 and have a width of 5000. So, the first interval runs from 7500 to 12500, the second from 12500 to 17500 and so on. By convention we agree that an interval includes the lower boundary point, but does not include the upper boundary point. So, for instance, a value of 7500 falls in the (7500, 12500) interval, but a value of 12500 does not. A value of 12500 falls instead in the (12500, 17500) interval.
• Construct a simple table including each interval, the count of observations in that interval and the relative frequency or percentage of observations in the interval.
Interval Count Percentage
7500-12499 3 5.56
12500-17499 12 22.22
17500-22499 23 42.59
22500-27499 11 20.37
27500-32499 2 3.70
32500-27499 2 3.70
37500-42499 0 0.00
42500-47499 0 0.00
47500-52499 1 1.85
Total 54 99.99
• Take for instance the interval from 12500 to 17499 (the red row). Scroll back to the listing of the data: the observations that fall in this interval are red. There are 12 such observations. The relative frequency of observations falling in this interval is then 12/54 = 0.2222 which is equivalent to 22.22%. The remainder of the table is constructed in this fashion.
• You might notice that the percentages do not add up to exactly 100%. This is due to accumulated round-off error. The exact percentage of observations in the 12500-17499 class is 22.22222... The slight difference between exact values and values to the nearest 0.01 are to blame. Generally, if your total is within 0.1 of 100% this artifact may be safely ignored.
• You might also notice that we have 9 classes rather than the desired 8. No big deal. (Sometimes you might get fewer intervals than you set out for.) This happened because of our choices of starting value and interval width. They were somewhat subjective. If you really have to have 8 intervals you might change the class width to 6000!
• Now, draw a grid for your histogram. The vertical axis should be marked high enough to accomodate the highest percentage interval. The horizontal axis should stretch from the lower endpoint of the first interval (7500) to the higher endpoint of the final interval (52500). Effective displays tend to have a width to height ratio of about 4:3. Note that the tick marks are labeled once every two intervals. This is to avoid crowding the tick mark labels (including the 12500 would crowd the labels). You could include all interval endpoints if you wrote smaller or omitted the final two digits of each label: 12500 would become 125 and a ledger would indicate that all figures are in 100s.
• Label your axes. Include the unit of measurement. The vertical axis measures the % of observations falling with an interval. The horizontal axis measures the variable Salary, measured in \$.

• 5.56% of the observations fall in the first interval (from 7500 to 12400). Draw a bar over that interval with height 5.56.

• 22.22% of the observations fall in the second interval (from 12500 to 17499). Draw a bar over that interval with height 22.22.

• Continue until all intervals have been exhausted. Here's the final product!