2-way tables are also known as
For each unit/subject data is collected on each of two categorical variables.
Cocaine addicts need the drug to feel pleasure. Perhaps giving them a medication that fights depression will help them stay off cocaine. A three-year study compared an antidepressant called desipramine with lithium (a standard treatment for cocaine) and a placebo. The subjects were 72 chronic users of cocaine who wanted to break their drug habit. Twenty-four of the subjects were randomly assigned to each treatment. After treatment we measure whether or not the subject relapses into cocaine use.
In this study the subjects are the 72 cocaine addicts. Two variables each categorical are measured on each subject: the treatment and whether or not the subject relapsed. We have a three groups of 24 people, each classified in by whether or not there is a relapse. The number of people in each treatment group is fixed in advance; the number of relapses is only known after we have the data.
A study of the relationship between men's marital status and the level of their jobs used data on all 8235 male managers and professionals employed by a large manufacturing firm. Each mans job has a grade set by the company that reflects the value of that particular job to the company. Grade 1 contains jobs in the lowest quarter of job grades, and grade 4 contains those in the highest quarter.
In this study the subjects are the 8235 males at the firm. Two variables each categorical are measured on each subject: his marital status and his job grade. We have a single group of 8235 men, each classified in two ways, by marital status and job grade. Note that the number of men in each marital status is not fixed in advanced; nor is the number of men in each job grade these are only known after we have the data.
The data is summarized in a "two-way (cross-classification) table."
We see that one-third (33.3%) of the subjects were allocated into each treatment (this was by design); this is the marginal distribution for the treatment variable. Notice that two-thirds (66.7%) of the subjects relapsed and one-third (33.3%) did not; this is the marginal distribution for the "whether relapsed or not" variable. This distribution is not fixed in advance.
Here the marginal distributions are best tabulated. First, by marital status:
This, of course, is not surprising, nor is it informative.
Second, by Job Grade:
The marginal distributions should not be displayed in a graph -- unless, of course, your boss asked for a pie chart or bar chart (read why they stink!). A simple table is sufficient. To display both variables at the same time use a segmented or stacked bar chart. Obtain this chart using a spreadsheet program. Here are instructions for Excel.
Your graph should tell the story without any additional information. Assume only that you are addressing an intelligent person who understands how to read the chart (if not, you can easily teach this) and that your reader has some understanding of the data that's been collected (in a written report, be particularly careful to clearly define your variables). Be particularly careful with colors and the legend. If you print to black and white -- color is no good. Contrasting "patterns" (often "cross-hatched" in different ways) can also be dangerously difficult to tell apart. Make certain, then, that when you print the graph is still readable. If you print to color, or just project on a computer, then usually Excel does OK. Make the data dominate the picture. If there's a story, the graph should tell it.
This type of data is 2 dimensional. Avoid 3-D graphs.
Read more...on association and tests for association.