Representative Samples: Does Sample Size Really Matter?
During the early phases of a survey project, a common question is, “How do I get responses from every person I survey?”
The answer is, usually, you don’t need to. Unless you’re working with a very small group, the vast majority of the time, you only need to get responses from a smaller portion of the population. What matters is getting a good, representative sample of your population.
Instead of asking “How do I reach everyone,” ask instead, “What’s a good sample size?”
In this post, we’ll explain sample size, representative samples, and give you a handy calculator so that you can easily identify the right sample size for your survey project.
Sample Size vs. Representative Samples
Your target sample size is how many people you need to reach to derive accurate insights from your study. A larger sample size should hypothetically lead to more accurate or representative results, but when it comes to surveying large populations, bigger isn’t always better.
In fact, trying to collect results from a larger sample size can add costs – without significantly improving your results.
That’s why we spend so much time thinking about sample size.
This calculator will give you an idea of how many people you need to survey based on the size of the total population.
Enter Interval – e.g. 4 = 4%
|* Assumes a normal distribution of 50%|
to calculate your sample size
Before you get too far into sample size, take a moment to consider representative samples, too. They are two related, but different issues. The sheer size of a sample does not guarantee its ability to accurately represent a target population.
Large unrepresentative samples can perform as badly as small unrepresentative samples.
A survey sample’s ability to represent a population is much more closely related to the sampling frame (the list from which the sample is selected) than it is to the sample size.
When some parts of the target population are not included in the sampled population, we are faced with selection bias, which prevents us from claiming that the sample is representative of the target population.
Avoiding Selection Bias in Your Survey Responses
When not every member of your target population has an equal chance of being chosen to take your survey, you’re at risk of polluting your data with selection bias.
There are four common ways that this occurs:
Bias Through Convenience Samples
Convenience samples are just what they sound like: choosing respondents that we can conveniently reach without regard to their demographic data.
These samples include respondents who are easier to select or who are most likely to respond; they will not be representative of harder-to-select individuals.
Samples from online panels are a good example of convenience samples.
These panels are composed of individuals who have expressed interest in participating in surveys, leaving out individuals who may be part of the target population but are not available for interviewing through the panel.
Selection Bias Via Undercoverage
Undercoverage happens when we fail to include all of the target population in the sampling frame.
Many online panels work hard at avoiding undercoverage bias, but the fact remains that certain demographics are underrepresented in panels.
For example, it is difficult to field online studies targeted at the total Hispanic population in the US without using a hybrid data collection approach that allows us to reach unacculturated Hispanics, who are usually underrepresented in most online panels.
Coverage bias is also common in phone surveys.
Many of these surveys use telephone list sampling frames that exclude households without landline access.
As more households substitute cell phones for their landlines, obtaining representative samples of certain demographic groups is almost impossible without including cell phone lists in the sampling frame.
Nonresponse and Selection Bias
Selection bias also happens when we fail to obtain responses from everyone in the selected sample.
Nonrespondents tend to differ from respondents, so their absence in the final sample makes it difficult to generalize the results to the overall target population. This is why the design of a survey is far more important than the absolute sample size to get a representative sample of the target population.
Final Three Sources of Sample Bias
Three other common ways that sample bias can creep into a survey are:
- Judgment Sample: This is a sample selected based on “representative” criteria that are chosen based on prior knowledge of the topic or target population. An example would be a study looking for a sample of teenagers, and trying to intercept them at a cross-section near a high school.
- Misspecification: This happens when we intentionally or unintentionally use screening criteria that leave out important subgroups of the population.
- Poor Data Collection: An example of this includes allowing whoever is available in the household to take the survey instead of the intended member based on certain screening criteria.
Giving Preference to Sample Source, Not Size
So, when it comes to getting a representative sample, sample source is more important than sample size.
If you want a representative sample of a particular population, you need to ensure that:
- The sample source includes all the target population
- The selected data collection method (online, phone, paper, in person) can reach individuals that represent that target population
- The screening criteria truly reflect the target population and do not inadvertently exclude valuable subpopulations
- You can minimize nonresponse bias with good survey design, incentives, and the appropriate distribution method
- There are quality controls in place during the data collection process to guarantee that designated members of the sample are reached