Steps for Testing Hypotheses

  1. A research question is phrased; a suggested answer (the research hypothesis) is postulated. At this point then, all we have is a theory about something, and a view of “what we think the result is.” Identify appropriate quantities describing the population -- these are called parameters.
  2. State the proposed theory in terms of the parameter(s) of interest. The resulting statement is known as the research hypothesis (more generally as the alternative hypothesis). The “anti-theory” is a statement of no change, or no difference, or no effect. This statement is known as the null hypothesis (it should be known as the "hypothesis of no difference"). These two hypotheses are usually written side by side, with the null preceding the alternative. The symbol for null hypothesis is H0. The symbol for alternative hypothesis is HA (or often H1). The colon (:) stands for “states.” (The terminology is unfortunate. “Null” usually indicates “worthless” and alternative then sounds like a second-best to worthless. Actually, “null” comes from nullify: The hope is to nullify this hypothesis. “Alternative” is also a poor choice of term: research hypothesis is much better.)
  3. A study is designed. A random sample (or even a number of them) must be obtained. Often this is by far the hardest part of conducting such a study. Determinations also must be made regarding sample sizes. A general rule says “Take the largest sample sizes you can.” But sometimes sampling costs money and the answer to “What is the best sample size” is not that easy to arrive at.
  4. Data is collected and recorded. Your first step in any real world situation is the same as it would be for any data you might collect: Investigate. In particular, obtain appropriate plots, looking for irregularities, surprises and outliers. Clean up the data if outliers are the result of poor data entry or if data is determined to have come from an undesired source.
  5. The data is summarized and a test statistic (test stat, or TS) is computed. The form of the TS depends on H0. When the research claim is about a single mean, or a comparison between two means, the TS often is a T-statistic (or Z-statistic); it measures the number of standard errors the estimate is from H0. The test stat measures compatibility between H0 and the data.
  6. The observed significance level (OSL; often called a P-value—short for probability value) is computed. The P-value is the probability, computed assuming that H0 is true, that the test statistic will take a value at least as extreme as that actually observed. It is the probability of getting an outcome as extreme or more extreme than the actually observed outcome. Extreme means “far from what we would expect if H0 were true”. The direction or directions that count as “far from what we would expect” are determined by HA. COMMIT THE FOLLOWING TO MEMORY: Small P-values indicate strong evidence against H0.
  7. What do we conclude? We can compare the P-value with a fixed value that we regard as decisive. The decisive value is called the significance level and is given by the Greek letter a. If the P-value is as small or smaller than a, we say that the data are statistically significant at level a. This is equivalent to saying we reject H0 and conclude HA. If the data are not statistically significant, we do not reject H0.

Textbooks often spend little time on the important issues described in steps 1, 3 and 4.