Sampling Distributions for Means

Worksheet 1

Open the interactive version of the normal tables. (Opens a separate browser window.)

Obtain some pictures of normal curves. Print them out and use them to help solve problems.

Each of these problems requires you to be familiar with the "Central Limit Theorem." This theorem -- which involves averages computed from random samples of data -- is described below. The basic setting is as follows:

  • A population; each unit in the population has a quantitative value (the variable) associated with it.
  • Parameters: The mean m and standard deviation s for the population of values are parameters.
  • A simple random sample. n units are randomly selected from the population in such a way that all possible samples are equally likely to be the selected sample.
  • Statistics: The mean x-bar and standard deviation s for the sample are statistics. They are used as estimates of the parameters. Statistics are variables.

The sample mean x-bar is the focus here. It is a variable (each random sample results in a different sample mean); as such it has a distribution.

  • The mean of this distribution is m.
  • The standard deviation of this distribution is s/sqrt(n) (where sqrt(n) means "square root of n").
  • The central limit theorem describes the pattern of variability. The distribution of x-bar is approximately normal. The quality of the approximation depends on two factors:
    1. How close to normal the population distribution is. The closer, the better.
    2. How large the sample size is. The larger, the better.

Avoid using this result for situations in which the combination of both nonnormal data and small sample size are present.

1. A bottling company uses a filling machine to fill plastic bottles with a popular cola. The bottles are supposed to contain 300 millilters (ml). In fact, the contents vary according to a normal distribution with mean m = 303 ml and standard deviation s = 3 ml.

  1. What is the probability that an individual bottle contains less than 300 ml?
  2. Now take a random sample of 10 bottles. What are the mean and standard deviation of the sample mean contents x-bar of these 10 bottles?
  3. What is the probability that the sample mean contents of the 10 bottles is less than 300 ml?

2. For 1998 as a whole, the mean return of all common stocks listed on the New York Stock Exchange (NYSE) was m = 16% and standard deviation s = 26%. Assume that the distribution of returns is roughly normal.

  1. What % of stocks lost money?
  2. Suppose we create a portfolio of 8 stocks by randomly selecting stocks from the NYSE and investing equal amounts of money in each stock. What are the mean and standard deviation of the sample mean returns x-bar for these 8 stocks?
  3. What is the probability the portfolio loses money? Explain the difference between this result and that of part (a).
  4. The probability is 0.05 that a portfolio constructed this way has a return of more than ________ ? (This would be the 95th percentile of portfolio returns; however, remember that these portfolios form a hypothetical population -- no one actually owns such a portfolio.)

3. The length of human pregnancies from conception to birth varies according to a distribution that is approximately normal with mean 264 days and standard deviation 16 days. (See a previous worksheet for some questions involving this distribution.) Consider 15 pregnant women from a rural area. Assume they are equivalent to a random sample from all women.

  1. What are the mean and standard deviation of the sample mean length of pregnancy x-bar of these 15 pregnancies?
  2. If we want to predict, with 90% accuracy, the sample mean length of pregnancy for 15 randomly selected women, what values do we use? (That is, find value L AND U such that there's a 90% probability the sample mean x-bar lies between L and U.)
  3. What's the probability the sample mean length of pregnancy lasts less than 250 days? (Contrast this with the probability a single pregnant women is pregnant for less than 250 days, which is 0.1908.)
  4. Toxic waste is believed to have effected the health of residents of this area. Suppose the sample mean length of pregnancy is indeed 250 days; use the result of part (c) to argue that the waste has an effect of length of pregnancy.


(Note: sqrt(#) stands for "square root of #".)

1. a) 0.1587, b) mean: 303, stdev: 3/sqrt(10) = 0.94868, c) 0.0008 (1 in 1250 -- very unlikely).

2. a) 0.2692, b) mean: 16, stdev: 26/sqrt(8) = 9.1924, c) 0.0409 (1 in 24); this is the probability that the average of 8 randomly selected stocks loses money; the 0.2692 is the probability a single stock loses money, d) need z = 1.645. then 16 + 16.45*9.1924 = 31.12%. That is, 5% of these portfolios will make more than 31.12%.

3. a) mean: 264, stdev: 16/sqrt(15) = 4.1312, b) need z = 1.645 and z = -1.645; go 1.645 st dev. from mean in either direction. 264 + 1.645(4.1312) = 270.8; 264 - 16.45(4.1312) = 257.2. So, between 257.2 and 270.8, c) 0.0003 (1 in 3333), d) Assume the toxin has no effect on length of pregnancy -- the average length of pregnancy for all people (including people exposed to the toxin) is 264. The chance of an average length of pregnancy at least as low as the observed 250 is very remote -- it should occur in 1 in 3333 trials on average. This leads one to believe that perhaps the result isn't due to chance alone and, instead, that our assumption of 264 days on average is in question. (This result is "beyond a reasonable doubt.")