What is Secondary Data Analysis?

Secondary data analysis involves a researcher using the information that someone else has gathered for his or her own purposes. Researchers leverage secondary data analysis in an attempt to answer a new research question, or to examine an alternative perspective on the original question of a previous study.

In order to fully understand secondary data analysis, it’s essential to familiarize yourself with the difference between primary and secondary data.

Primary Data vs. Secondary Data

Primary data is original data that researchers collect for a specific purpose.

Secondary data, on the other hand, is collected for a different purpose other than the one for which it is used. 

To add context to the definition of secondary data, let’s consider an example.

If an entrepreneur is considering opening a new business, he or she could leverage census data that has been collected by the government. 

Although the entrepreneur would not be collecting the data his or herself, census data includes information that could greatly benefit the entrepreneur, such as the average age, household income and education level in a particular geographical region.

By digging into this census data to inform the decision of whether or not the entrepreneur should open the new business, the entrepreneur is performing secondary data analysis.

Factors to Consider Before Conducting Secondary Data Analysis

There are certain factors that a researcher must consider before deciding to move forward with secondary data analysis. 

Because the researcher did not collect the data that he or she will be working with, it’s imperative for him or her to become familiar with the data set. This familiarization process entails:

  • Learning about how the data was collected

  • Learning who the population of the study was

  • Learning what the objective of the original study was 

  • Determining what the response categories were for each question displayed to survey respondents

  • Evaluating whether or not weights need to be applied during the analysis of the data

  • Deciding whether or not clusters or stratification need to be accounted for during the analysis of the data


The Advantages of Secondary Data Analysis

One of the most noticeable advantages of using secondary data analysis is its cost effectiveness.

Because someone else has already collected the data, the researcher does not need to invest any money, time, or effort into the data collection stages of his or her study.  

While sometimes secondary data must be purchased by a researcher looking to use it to inform a study they’re working on, these costs are almost always lower than what the expenses would be if the researcher were to create the same data set from scratch. 

Also, the data from a secondary data set is typically already cleaned and stored in an electronic format, so the researcher can spend his or her time rolling up their sleeves and analyzing the data instead of spending time having to prepare the data for analysis.

Another benefit of analyzing secondary data instead of collecting and analyzing primary data is the sheer volume and breadth of data that is publicly available today. 

For instance, leveraging the findings from studies that the government has conducted provides researchers with access to a volume of data that would have simply been impossible for the researcher to amass themselves. 

Longitudinal data at this scale is extremely powerful. The government could have been collecting data on a single population for long, extended periods of time. 

Instead of investing that time, by using the government’s publically available data to perform secondary data analysis, the researcher has avoided years of intensive labor. 

The Disadvantages of Secondary Data Analysis

The biggest disadvantage of performing secondary data analysis is that the secondary data set might not answer the researcher’s specific research question to the degree that the research would have hoped. 

If a researcher sets out to perform a study with a very particular question in mind, a secondary data set might not contain the precisely specific information that would allow the researcher to answer his or her question.

Similarly, when a researcher has a specific question or goal in mind, it can sometimes be difficult to identify secondary data that is valid for use, as the data might not have been collected during the timeframe the researcher was hoping for, or in correct the geographical region, etc.

Another disadvantage is that no matter what a researcher does to vet a secondary data set, they will never be able to know exactly how the data was collected, and how well that process was executed. 

Without being the one who is actually developing surveys and distributing them to the appropriate populations, it’s impossible to know the extent to which the researchers that collected the data went to ensure validity or quality, or if they experienced issues such as low response rates or respondents misunderstanding what a question was truly asking.

Simply put, since the researcher conducting the study did not collect the data he or she will be using, he or she ultimately has no control over what their secondary data set contains. 

Conclusion

Secondary data analysis is a convenient and powerful tool for researchers looking to ask broad questions at a large scale. 

While it has its benefits, such as its cost effectiveness and the breadth and depth of data that it provides access to, secondary data analysis can also force researchers to alter their original question, or work with a data set that otherwise is not ideal for their goals.

The next time you’re looking to perform a large-scale research study, consider secondary data analysis.


Have you used secondary data analysis to inform a research study you were conducting? How did things go for you?

We’d love to hear about your experience. Drop us a line in the comments below!