| |
Here is the data on
starting salaries of 1995 Psychology graduates. When
constructing a histogram it is helpful to sort the
observations. 08820
10800 12000 12500 13000
14000 15000 16000 16500 16600 16700 16900 16900 17000
17000 17600 17880 18000
18000 18000 18000 18000 18000 18000 18000 18000 18000
18500 18680 19100 20000 20000 20000 20000 20000 20300
20900 22000 23000 23000 23000 23000 23400 24000 25000
25000 26000 26000 27000 30000 30000 32500 37000 48000
Minimum = 8820 Maximum = 48000
Range = 39180.
- To begin, decide how many
intervals you would like. A good rule of thumb is
to use the square root of the number of
observations (after rounding). Here, that is the
square root of 54 = 7.34; round up and use 8.
- The interval width should
then be approximately equal to the range divided
by the number of intervals. Range/# Intervals =
39180/8 = 4897.5; I'll round up to the
conveniently even figure of 5000. (It is quite
helpful to use a round number.)
- Start the first interval at
a convenient value below the minimum. Here the
minimum is 8820, so begin at 7500 (other choices
are equally acceptable).
- The intervals then begin at
7500 and have a width of 5000. So, the first
interval runs from 7500 to 12500, the second from
12500 to 17500 and so on. By convention we agree
that an interval includes the lower boundary
point, but does not include the upper boundary
point. So, for instance, a value of 7500 falls in
the (7500, 12500) interval, but a value of 12500
does not. A value of 12500 falls instead in the
(12500, 17500) interval.
- Construct a simple table
including each interval, the count of
observations in that interval and the relative
frequency or percentage of observations in the
interval.
| Interval |
Count |
Percentage |
| 7500-12499 |
3 |
5.56 |
| 12500-17499 |
12 |
22.22 |
| 17500-22499 |
23 |
42.59 |
| 22500-27499 |
11 |
20.37 |
| 27500-32499 |
2 |
3.70 |
| 32500-27499 |
2 |
3.70 |
| 37500-42499 |
0 |
0.00 |
| 42500-47499 |
0 |
0.00 |
| 47500-52499 |
1 |
1.85 |
| Total |
54 |
99.99 |
- Take for instance the interval from 12500 to
17499 (the red row). Scroll back to the listing
of the data: the observations that fall in this
interval are red. There are 12 such observations.
The relative frequency of observations falling in
this interval is then 12/54 = 0.2222 which is
equivalent to 22.22%. The remainder of the table
is constructed in this fashion.
- You might notice that the percentages do not add
up to exactly 100%. This is due to accumulated
round-off error. The exact percentage of
observations in the 12500-17499 class is
22.22222... The slight difference between exact
values and values to the nearest 0.01 are to
blame. Generally, if your total is within 0.1 of
100% this artifact may be safely ignored.
- You might also notice that we have 9 classes
rather than the desired 8. No big deal.
(Sometimes you might get fewer intervals than you
set out for.) This happened because of our
choices of starting value and interval width.
They were somewhat subjective. If you really have
to have 8 intervals you might change the class
width to 6000!
- Now, draw a grid for your histogram. The vertical
axis should be marked high enough to accomodate
the highest percentage interval. The horizontal
axis should stretch from the lower endpoint of
the first interval (7500) to the higher endpoint
of the final interval (52500). Effective displays
tend to have a width to height ratio of about
4:3. Note that the tick marks are labeled once
every two intervals. This is to avoid crowding
the tick mark labels (including the 12500 would
crowd the labels). You could include all interval
endpoints if you wrote smaller or omitted the
final two digits of each label: 12500 would
become 125 and a ledger would indicate that all
figures are in 100s.
- Label your axes. Include the unit of measurement.
The vertical axis measures the % of observations
falling with an interval. The horizontal axis
measures the variable Salary, measured in $.

- 5.56% of the observations
fall in the first interval (from 7500 to 12400).
Draw a bar over that interval with height 5.56.

- 22.22% of the observations
fall in the second interval (from 12500 to
17499). Draw a bar over that interval with height
22.22.

- Continue until all
intervals have been exhausted. Here's the final
product!

|
|