Variable Types

Variables are often specified according to their type and intended use.

Be careful when referring to a variable; don't confuse the variable with values it may take on. In the examples below I'll attempt to describe this common error.


Different types of variables

Quantitative Variable
A quantitative variable is naturally measured as a number for which meaningful arithmetic operations make sense. Examples: Height, age, crop yield, GPA, salary, temperature, area, air pollution index (measured in parts per million), etc.
Categorical Variable
Any variable that is not quantitative is categorical. Categorical variables take a value that is one of several possible categories. As naturally measured, categorical variables have no numerical meaning. Examples: Hair color, gender, field of study, college attended, political affiliation, status of disease infection.
 
In a study asking respondents to identify themselves as Republican, Democrat or Independent or Other, each respondent will answer with exactly one of these. These are the values the variable takes. Republican is not a variable (it does not vary from person to person) but party affiliation is (it does vary from person to person).
 
Often categorical variables are disguised as quantitative variables. For example, one might record gender information coded as 0 = Male, 1 = Female. (Data is generally easier to manipulate in an analysis spreadsheet when it's coded quantitatively.) Still---the variable is categorical; it is not naturally measured as a number. In some cases it's tougher to make the distinction. A psychologist may collect survey data of the following nature
 
How do you feel about the information on this page? (Circle one.)

1

2

3

4

5

Awful

Poor

OK

Good

Great

It's a toss up. Technically the numbers are artificial. But, the psychologist will work with these numbers as though they had meaning. For instance, two people might respond "Awful" and "OK." The psychologist would record 1 and 3 and, perhaps, compute an average of 2.0. From my point of view it's meaningless to have the average of Awful and OK being Poor! Nevertheless, this sort of scale (called a "Likert Scale") is often used in social science research. I'd classify this variable as categorical; it would not be entirely incorrect to classify it as quantitative.

You can see how any categorical variable may be coded to look like a quantitative variable---simply by arbitrarily assigning numbers to categories.

Ordinal Variable
On ordinal variable is a special type of categorical variable for which the levels can be naturally ordered. The example above provides a good illustration of an ordinal variable. Even if we ignore the numbers, we still may order the responses. Awful is "worse than" Poor; Poor is worse than OK; OK is worse than Good; Good is worse than Great. A natural ordering exists for these categories. Contrast this with a categorical variable such as hair color. There is no natural ordering for the various colors of hair..

Math 158 students do not worry much about ordinal variables. They treat them as if they were categorical variables. However, in advance statistics ordinal variables are treated differently to make use of the added structure they give to a variable.


Different uses of variables

In many studies more than one variable is recorded per case or individual. It is often the purpose of the study to determine if and/or how one or more variables affect another. This is a basic paradigm in statistical analysis; the distinctions that are made here are integral to the way a problem is stated and analyzed.

Response Variable
The outcome of a study. A variable you would be interested in predicting or forecasting. Often called a dependent variable or predicted variable.
Explanatory Variable
Any variable that explains the response variable. Often called an independent variable or predictor variable.

In general identifying these amounts to deciding which variable(s) would be used to predict another. Here follow a few examples.

Example

Consider a study performed by a medical center to determine which of two heart surgeries is most effective: angioplasty (running rubber tubes through the arteries) or bypass (rerouting arteries). The purpose of either procedure is to prolong the life of the patient. The study will certainly record the survival time of each patient (measured from the time of the surgery). This really is the outcome of the study; survival time is the response variable. Now, each patient will get one of the two types of operations; this is a second variable. . .let's call it the "procedure" variable; it takes one of two possible values, Angioplasty and Bypass. The entire purpose of the study is to determine how, if at all, the procedure affects survival time. Type of surgery is an explanatory variable. We would use type of operation (explanatory variable or predictor) to predict survival time (response or predicted variable). Survival time may well depend on procedure; survival time is the dependent variable and procedure is the independent variable. Note that the response is measured after the explanatory. This is often---but not always---the case. The response variable is quantitative, the explanatory variable is categorical. In a true clinical trial many more explanatory variables would be recorded: gender, age at the time of surgery, state of health pre-surgery (how would this be measured?), numerous physiological indicators and so forth. There would be but one response variable, survival time! (Actually, there would be others. Quality of life after the operation is important, as is an analysis of the side-effects attributable to the two procedures.)