Statistical Consulting Program



This web page provides useful information on sampling strategies.  Here is where you will find information on how to design samples as well as on what different kinds of sampling designs are available to statisticians.      

WB01436_.gif (236 bytes)         When we decide to study a population, very often we are unable to look at all the individuals in the population.   Real life issues like the lack of time, limited amount of money, and  inconvenience to the individuals under study disallow us to include the entire population in our study.  Instead we choose a sample from the population, which reflects its structure and nature.  We want our results to be reliable and dependable, and for those reasons our sample must represent the entire population. Choosing a right sample is then a crucial matter.

In statistics, a sample design is a definite plan, determined before any data are actually collected.



What to be aware of when choosing a sample?


WB01336_.gif (793 bytes)     VOLUNTARY RESPONSE SAMPLES                                                                                                                                                       

People who voluntarily respond to surveys or opinion polls often do not constitute a good group of subjects in the general population. Such samples evidently consist of people who have strong opinions about the issues included in the study.  Most individuals volunteer because they have something to say, more often negative than positive. The responses are then biased because the subjects are not chosen carefully enough.  The analyst does not have adequate control over the make up of the sample.  

WB01336_.gif (793 bytes)     CONVENIENCE SAMPLING                                                                                                                                                                  

Convenience sampling is another sampling method which often results in biased samples. To save both time and money, sometimes, people choose individuals that are easy for them to reach.  For instance, suppose we are interested in learning how much people in our town enjoy to read books.  Going to a local library will certainly enable us to find a lot of citizens whom we might ask this question.  However, on average, these individuals probably enjoy reading more than somebody met on a street.  Conclusions drawn from such a sample may not reflect the views of all citizens in our town.

WB01336_.gif (793 bytes)     BIAS                                                                                                                                                                                                                       

Both sampling strategies listed above might yield us results that are not accurate for our population.  Whenever a sampling method favors certain groups of individuals in a population over others, the resulting conclusions are usually inaccurate and are said to be biased.   Personal choices often produce bias.  To avoid biased samples, we should not favor any groups over  others or allow the individuals to self-elect themselves to be in a study.



WB01436_.gif (236 bytes)  Below you will find different sampling methods that are used by statisticians to collect data.  These sampling methods are designed to eliminate bias.  Each method is different in a sense that it can be applied to different populations for different purposes.  You must choose a method that applies to your particular study or experiment.


WB01336_.gif (793 bytes)     SIMPLE RANDOM SAMPLE    

This method of designing samples resembles a well known idea of placing names in a hat and selecting a handful.  The hat symbolizes the population and a handful the sample. Simple random sample (SRS) is then a sample chosen in a way such that each individual in the population has an equal chance of being selected.  Such design also gives various samples (if necessary) an equal chance to be chosen.  Choosing a sample by chance eliminates bias.

How do we select such a sample?    As was mentioned above, the underlying idea behind this design is to select names randomly "from a hat."  One way to choose individuals randomly is to allow a computer to select names from a list.   Texas-Instruments graphing calculators possess that feature called rand. We can also adhere to a table of random digits which can be found in most statistics textbooks.   Each individual in the population must be assigned a number and then we use the table to select a sample at random.  The order in which we assign numbers does not make a difference.  All labels, however, must have the same number of digits.

WB01336_.gif (793 bytes)     SYSTEMATIC SAMPLING              

This method of sampling might not be a random sample in accordance with the definition, but the samples it produces are often treated as random samples. Here is how these samples are formed. Instead of choosing a random name from a "hat," as was done with SRS, we select every nth name on our list of individuals in the population.  Whether we choose every 20th, 100th, 1000th  or  nth individual in the population really depends on the size of the population, how accurate we want our results to be (bigger samples yield more reliable results), time and money. The randomness element is usually introduced by picking a random number with which we start our selection.  

The danger in systematic sampling lies in the possible hidden periodicities of selection.   If for some reason, we list the individuals in a certain order where, for example, every 10th individual belongs to a subgroup in the population, choosing every 10th out of the list will clearly be biased.

WB01336_.gif (793 bytes)    STRATIFIED RANDOM SAMPLING     

When we know something about a population, how it is composed, and if such a composition has a relevant meaning to our study, we may stratify our population.  We stratify by dividing the population into a number of non-overlapping sub-groups, or strata. Then we choose a separate SRS in each strata and combine these SRSs to form a sample. For example, a population of school districts in the state of New Jersey can be divided into urban and suburban districts.  The goal is to have as much homogeneity in each strata as possible.

The advantage of this sampling method is the fact that all subgroups will be represented in the sample, resembling the entire population. 

WB01336_.gif (793 bytes)     CLUSTER SAMPLING                           

This method is similar to the stratified random sampling method.  We first divide a population into a number of smaller, non-overlapping subdivisions. Next, instead of choosing a SRS from every subdivision, like we did with the stratified random sampling, a number of  subdivisions called clusters is randomly selected to be included in the over-all sample. All individuals from the selected clusters are included in that sample. 

Practically speaking, sometimes several sampling methods are necessary to study the same population.  Especially when a population consists of many individuals. The population of the United States of America is one such case, where a combination of different sampling strategies is needed to produce a reliable sample.



 WB01436_.gif (236 bytes)    When choosing a sample there are certain cautions that we must be aware of, especially if our population consists people.  As mentioned above, choosing a random sample is a very important matter.  When we choose individuals randomly, we minimize bias.   Samples are often biased when individuals volunteer themselves to be included in studies/experiments.  In addition, favoritism of certain individuals over others, which in effect implies choosing a pre-selected group of individuals, can also yield results that will not reflect the  characteristics of the overall population. 

Under-coverage, non-response, response bias and wording of questions are also important, and we should be aware of their presence and consequences in the results' analyses.  Now, what are they?  Let us first explain the first one and then continue with the rest of them. 

         Under-coverage results from failing to include all members of a population in the analysis.  Very often a complete list of the individuals in a population is not available.  The possible reasons could be several. Most of the time, there is not enough time, money or resources to include everyone in a study or experiment. In such cases the results different from the truth.

          Next, there is non-response. Even though we select individuals carefully, i.e. randomly, and we eliminate under-coverage, the individuals may refuse to cooperate with us. People, especially, can be difficult to reach by phone.  Some individuals do not want to be convinced to put in their time and effort for the purpose of our goal.  

          Response bias is one of the most obvious obstacles facing a statistician.  The behavior of the respondents can seriously affect our results.  As we all know, there are people who lie when asked undesirable questions.  Discrimination is often one of the factors that affect opinions.  There are issues that people feel uncomfortable to talk about.   In addition, the interviewers can unintentionally lead respondents to give a more desirable answer.  A body language of both an interviewer and an interviewee then causes bias in responses.

          Finally, there is the wording of questionsThere are at least two kinds of questions that are phrased incorrectly: questions that lead to a certain response and questions that confuse people.  Questions should be phrased in such a way that allows the individuals to honestly express their view points.  There should be no hidden messages.  Here are two examples of badly phrased question: 1. "Cigarettes cause lung cancer.  Would like your child to smoke cigarettes?" and  2. "Is it not possible that education does not imply success?" Question #1 leads us to ban cigarettes based on the danger of lung cancer.  Question #2 is simply confusing.