Defining Data Cleansing
Once you’ve administered a survey and are ready to analyze and take informed action on the response data, it’s essential to make sure that you’re working with the highest quality data possible. That’s where data cleansing, also known as data scrubbing, can help tremendously.
By definition, data cleansing refers to the discovery of errors in a data record, and the removal or correction of the mistakes that have been found.
While it’s always considered best practice to take the time to clean your data, data cleansing is particularly important if you’ve used a panel or incentives to collect your responses.
While panels and incentives are both valuable methods used to ensure that you receive a significant number of responses, these audiences are much more likely to speed through a survey in order to receive the reward that they’ve been promised.
Accordingly, those survey respondents that do in fact speed through your survey will not provide sound, accurate, or complete data. Using this data in your analysis will skew your outcomes significantly, and lower its accuracy.
In order to analyze and act on survey data with confidence and efficiency, you need to eliminate these unqualified responses from your final data set.
While data cleansing is a worthwhile step in any analytical process, it can also be time-consuming due to its subjective nature.
Fortunately, SurveyGizmo offers a timesaving advanced data cleaning feature that allows you to remove any invalid or biased responses from your survey results. By leveraging this feature, you can identify and remove unqualified responses to ensure that you’re making decisions based on the soundest data possible.
Three Reasons Why You Need a Data Cleansing Tool
The ultimate purpose while data cleansing should be to remove any overt bias or invalid responses from your final data set.
However, while cleaning data, it’s important to remember not to introduce your own bias. If you let your bias impact the results, you might start incorrectly interpreting them, which leads to muddy and inaccurate insights.
It’s human nature to make these assumptions, so it’s important to keep this top-of-mind. Resist the temptation to tweak the question or answer options after the fact to skew the results one way or another.
Aside from overcoming this natural tendency to let biases affect survey analysis, a data cleansing tool will allow you to accomplish the following three tasks.
#1: Identify ‘Speeding’ Responses
A data cleansing tool will allow you to easily identify respondents who have sped through your survey.
When people rush through your surveys with the goal of completing them as quickly as possible, the quality of your data will suffer tremendously.
SurveyGizmo’s Data Cleansing Tool displays the distribution of average response times per question in an easily digestible chart. These average response times are calculated for each question by taking the total response time divided by the number of questions that the respondent was presented.
The chart will show a red line, used to eliminate speeders, at the fastest 1 percent of responses. It will also show a blue line, used to normalize slow outliers, at the slowest 10 percent of responses.
To start the data cleansing process, click and drag the red line to quarantine the fastest responses. The decision of the placement of the red line will be based on the curve of the data, but remember to pay attention to the percentage and the number of responses you are quarantining. See the screen grab below.
Next, click and drag the blue line to the left if you wish to normalize the average per question response time. All responses in the blue area will be given the max average per question response time indicated by the blue line.
Make sure to keep an eye on the percentage of responses you are affecting, and the change in the average time per question as you make these adjustments.
Once you’re satisfied with your adjustments to average time per question, and the responses that are quarantined as a result, the next step is to evaluate and ensure answer quality.
#2: Ensure Answer Quality
By using the data cleansing tool to ensure answer quality, the remaining responses that were not quarantined for speeding can be further vetted to make sure that they provide actionable data.
There are eight possible indicators of poor data, and each is represented by a flag in SurveyGizmo’s Data Cleansing Tool.
Each poor data quality flag has a default weight. If you wish to change the importance of any of these poor data flags, you can increase the importance by increasing the flag weight or decrease it by assigning a lower flag weight.
Listed below are each of the poor data quality indicators used by SurveyGizmo’s Data Cleansing Tool to identify and eliminate inactionable, uninformative data.
- Straightlining/Patterned Responses
- Fake Answers
- Trap/Red Herring Questions
- Consistency Check
- All Checkboxes
- Single Checkbox
- One Word Answers
If you’d like to learn more about each of these poor data quality indicators, be sure to read our documentation on data cleansing here.
Once you’re finished choosing the flags that you wish to use and customizing their weights, it’s time to choose your quarantine threshold. If you’ve made changes to the data flags, the chart will need to be refreshed.
A Dirty Data Score is calculated for each response using the flags that have been enabled.
Each flag is multiplied times the weights that have been selected for each flag type. All Dirty Data Scores are normalized on a curve so that the response with the highest score receives a 100, and the response with the lowest score receives a zero.
The Dirty Data Scores for your data set will be charted on the x-axis with the number of responses charted on the y-axis.
Click and drag the red line to quarantine responses with high Dirty Data Scores. Responses in the red area will be quarantined and removed from your results.
When you are satisfied with the responses to be quarantined, click the Quarantine button.
#3: Take Informed Action
SurveyGizmo’s Data Cleansing Tool will show how many responses have been quarantined.
From here, you can choose to view your quarantined list, export your quarantined responses, or restore all of your quarantined responses.
The View Quarantined List link in the image above will take you to the Individual Responses page, where the quarantined responses will be flagged as Quarantined.
The Export Quarantined Responses link will allow you to download quarantined responses and view them in an Excel spreadsheet.
You’ll see that each data cleansing category is displayed as a column in the export with a count of instances for each response.
For example, in the screenshot above, the response in row one has one instance of Fake answers, whereas the response in row two has three instances of Gibberish answers.
Note that if you wish to add and remove responses from quarantine, you can do so on the Data Quality tab of each response under Results > Individual Responses.
If you’d like to adjust your data cleansing results, including how many responses are quarantined and the flags that determine dirty data, you can click the option to Start Over.
Once you’ve cleaned your data, responses flagged as quarantined will be automatically filtered out of your reports and exports. If you wish to include these, go to the Filter tab of your report or export and uncheck the option to Remove Responses Flagged as Quarantined.
The process of data cleansing cannot be ignored if you want to ensure that you’re making business decisions based off of the highest quality data possible.
A data cleansing tool streamlines the process and allows you to manipulate your data in the most convenient ways possible, saving you time and headaches along the way.
Want to learn more about SurveyGizmo’s Data Cleansing Tool? Try it out for yourself by starting a free trial of SurveyGizmo today!