During the early phases of a survey project, a common question is, “How many responses do I need to make sure my results represent my target population?”
Basically this question boils down to, “What’s a good sample size?”
It’s a simple question, but the answer isn’t quite so simple.
Sample Size vs. Representative Samples
Sample size and representativeness are two related, but different issues. The sheer size of a sample does not guarantee its ability to accurately represent a target population.
Large unrepresentative samples can perform as badly as small unrepresentative samples.
A survey sample’s ability to represent a population is much more closely related to the sampling frame (the list from which the sample is selected) than it is to the sample size.
When some parts of the target population are not included in the sampled population, we are faced with selection bias, which prevents us from claiming that the sample is representative of the target population.
Avoiding Selection Bias in Your Survey Responses
When not every member of your target population has an equal chance of being chosen to take your survey, you’re at risk of polluting your data with selection bias.
There are four common ways that this occurs:
Bias Through Convenience Samples
Convenience samples are just what they sound like: choosing respondents that we can conveniently reach without regard to their demographic data.
These samples include respondents who are easier to select or who are most likely to respond; they will not be representative of harder-to-select individuals.
Samples from online panels are a good example of convenience samples.
These panels are composed of individuals who have expressed interest in participating in surveys, leaving out individuals who may be part of the target population but are not available for interviewing through the panel.
Selection Bias Via Undercoverage
Undercoverage happens when we fail to include all of the target population in the sampling frame.
Many online panels work hard at avoiding undercoverage bias, but the fact remains that certain demographics are underrepresented in panels.
For example, it is difficult to field online studies targeted at the total Hispanic population in the US without using a hybrid data collection approach that allows us to reach unacculturated Hispanics, who are usually underrepresented in most online panels.
Coverage bias is also common in phone surveys.
Many of these surveys use telephone list sampling frames that exclude households without landline access.
As more households substitute cell phones for their landlines, obtaining representative samples of certain demographic groups is almost impossible without including cell phone lists in the sampling frame.
Nonresponse and Selection Bias
Selection bias also happens when we fail to obtain responses from everyone in the selected sample.
Nonrespondents tend to differ from respondents, so their absence in the final sample makes it difficult to generalize the results to the overall target population. This is why the design of a survey is far more important than the absolute sample size to get a representative sample of the target population.
Final Three Sources of Sample Bias
Three other common ways that sample bias can creep into a survey are:
- Judgment Sample: This is a sample selected based on “representative” criteria that are chosen based on prior knowledge of the topic or target population. An example would be a study looking for a sample of teenagers, and trying to intercept them at a cross-section near a high school.
- Misspecification: This happens when we intentionally or unintentionally use screening criteria that leave out important subgroups of the population.
- Poor Data Collection: An example of this includes allowing whoever is available in the household to take the survey instead of the intended member based on certain screening criteria.
Giving Preference to Sample Source, Not Size
So, when it comes to getting a representative sample, sample source is more important than sample size.
If you want a representative sample of a particular population, you need to ensure that:
- The sample source includes all the target population
- The selected data collection method (online, phone, paper, in person) can reach individuals, with characteristics typical of those possessed by the population of interest
- The screening criteria truly reflect the target population
- You can minimize nonresponse bias with good survey design, incentives and the appropriate contact method
- There are quality controls in place during the data collection process to guarantee that designated members of the sample are reached
For help on sample size calculation check out Survey Sample Size – What Should It Be?
To calculate sample size and margins of error, try the Sample Size and Margin of Error Calculators from Relevant Insights.