## Prediction IntervalsA class of people; all of different ages (when measured to the nearest day). If I randomly sample 19 people, how do I predict the next (20th) person's age? - Consider first all 20 people. The 20th person is equally likely to be the youngest, second youngest, third youngest, etc., third oldest, second oldest, oldest. That is, in the ordered list of all 20 people, the 20th person selected is equally likely to occupy any of the positions 1, 2, 3, . . . , 20.
Examine the picture below.
In this picture the first 19 people have been isolated from the remaining 20th person. If the 20th person is the youngest, then the 20th person "fits in" in the gap to the left of the smallest value. If the 20th person is second youngest, then the 20th person fits in the 2nd gap, and so on. Rephrasing the point made above: The 20th person is equally likely to fall into each of the 20 gaps formed by the first 19 people.- Since there are 20 gaps, each gap carries a probability of 1/20 or 0.05 (5% if you like).
- The chance is 0.05 + 0.05 = 0.10 (or 10%) that person 20 falls outside the entire range of the first 19 people. The chance is 0.90 (90%) that selection 20 falls inside the range. As a result, the range from the smallest to largest of the first 19 people is a 90% prediction interval (PI) for the next (subsequent) observation. That's it! That's what a prediction interval is; in specific a 90% PI. Note that the probabilities are obtained from the number of gaps, which is 1 greater than the number of observations. Each gap has probability 1/(n + 1) where n is the sample size.
- Here's data for a random sample of 19 people
drawn from the class.
6516 7565 6684 7974 7067 6648 6657 7214 7597 7088 7300 7898 7246 8546 7783 7704 6752 8064 7266
Begin by sorting the data.
6516 6648 6657 6684 6752 7067 7088 7214 7246 7266 7300 7565 7597 7704 7783 7898 7974 8064 8546
The 90% PI is then (6516, 8546). We write intervals like this with the small value first. Read it: "Betweeen 6516 and 8546." The values that define this interval are called the*bounds*of the interval. 6516 is the lower bound and 8546 is the upper bound. (Some people use the term*endpoint*in place of*bound*.) The percentage (here 90%) is called the*confidence level*or*procedural reliability*for the procedure. - Of course maybe you don't need to be 90%
confident in your result. If we move in one
observation from each end, covering two more gaps
at 5% each, we obtain an 80% PI. For the data
above this 80% PI is (6648, 8064). Below you see
a dotplot that marks of a number of prediction
intervals. Make sure you grasp the relationship
between the confidence (or reliability) level of
the procedure (the %) and the width of the
interval.
## InterpretationLike almost all statistical intervals, this one can be a little tricky to interpret -- it requires some thought. For example, the 90% PI of (6516, 8546) given above is intended to predict the age of the next randomly selected person in the class. It turns out that, of the remaining students in the class, 55 of 56-- that's 0.9821 or 98.21%--have age between 6516 and 6546. This is the conundrum: The reasoning used to develop this only works when talking about "random data." Once a sample is selected it becomes non-random. This may be easier to see by looking at the graph above. Before seeing any data each gap had probability of 5%. Now that we have data, compare the gap between the second and third smallest values (6648 and 6657--only 9 days apart) to that between the fifth and sixth smallest values (6752 and 7067--that's 315 days apart). Common sense tells us that it's far more likely that the twentieth observation will fall in the larger gap. The 90% refers to the average predictive success of the entire procedure. That is, if I repeated the following - sample 19 people at random,
- form the 90% PI,
- sample a twentieth person at random
then 90% of the time the twentieth person falls within
the bounds of the interval. Another way of thinking about
it is that if repeated over and over, on Follow the path to exercises involving prediction. |