Random Samples

A simple way to randomly select observations from your data. At the end you can see how to randomly order (permute) your data.

To randomly select a sample of size n from data in a column.

Here the data is executive compensation: www.oswego.edu/`srp/stats/comp1f04.txt.

Column 1 is the sector the CEO works in; column 2 is his/her total compensation.

Begin by accessing the “Sample from columns” command.

The columns c3 and c4 have names already set up. In this operation we’ll choose 10 of the rows in c1 and c2 at random, and place those rows in c3 and c4. c3 and c4 will then constitute a random sample of 10.

This is the proper way to fill in the dialog. DO NOT check the sample with replacement box UNLESS you want to entertain the possibility of sampling the same row twice. In almost all applications the sampling is done without replacement. Click OK.

When you’ve finished you will have a 10 row fill in columns c3 and c4.

You should try this on your own. You will get a different result in columns c3 and c4 – because the sample is chosen randomly.

After this, you can check if you like to find which row(s) were chosen. This can be somewhat tedious – although if you select c2 and right-click to “Find”, you can locate values quickly enough:

We can see here that row 1279 was the first row in the random selection.

Note:

It is not necessary to specify two columns for input (and output). In this example information from both columns 1 and 2 were taken into columns 3 and 4. If you were not interested in the Sector information (C1) you could sample from column c2 (only), storing the sample in column c3 (only).

It is necessary to specify the same number of columns in both boxes in the dialog.

To permute (order) your data randomly

To randomly order your data simply sample the same number of rows as there are rows in the data. For example, with the compensation data, for which there are 1422 rows, you would “Sample 1422 rows…”