Why and How to Clean Your Online Survey Data

After your data collection is complete, you’re ready to start analyzing your data set. Or are you? Before you start working with your data, you need to make sure that it is clean.

Why You Should Clean Your Online Survey Data

Why clean data? It is always a good practice to clean your data, particularly if you use panel companies or you’ve used an incentive to increase your response rate. But, we recommend cleaning your data before deriving actionable results. Your purpose in data cleaning should be to remove any overt bias or invalid responses.

If you are not cleaning your data, or you do not know how to clean your data, you are not alone. But it is important for you to start taking this important step to ensure data quality. According to the SurveyGizmo Benchmark Guide Survey, 70 percent of online surveyors clean their data before analyzing.1

This process can be time consuming, but it is well worth the effort to ensure that you weed out invalid and biased responses.

Removing Straight-Line and Christmas Tree Responses

When cleaning your data, look for these types of responses and then remove them from your data set to ensure data quality:


When a respondent answers the same option for each item without reading the question, we call it Straight-Lining. This would look exactly like it sounds; a straight line of responses to a set of questions. Before removing a response from your data set, you may wish to review multiple questions that they have answered, to make sure that this is an actual intentional pattern of activity and not valid responses.

Christmas-Tree Behavior

Christmas-Tree behavior looks like the respondent answered the options in a Christmas-Tree pattern without reading the question. You will have to turn your head to the left to see the Christmas tree design in this example, but any designs or ‘artwork’ that you see in the respondent data should be viewed with caution.

Both Straight-Lining and Christmas-Tree behavior are methods used by respondents to complete a survey as quickly as possible. This behavior can be motivated by incentives.

Straight-Lining is pretty easy to pick out in Excel. Christmas-Tree responses are a little more difficult to find, however, if you’ve used numeric reporting values, you should be able to easily spot ascending or descending patterns in Excel and remove these responses.

Take Your Time

Another method for deciphering invalid responses is to look at response timers; that is, the amount of time it takes a respondent to complete a page of your survey, or the entire survey. Most survey software will allow you to export this information to Excel with your survey data. You can then evaluate if a respondent took to short of a time to read and complete their responses.

Don’t know how long it should take someone to complete a page, or the entire survey? There is a quick way to evaluate this! Time yourself reading the survey, but be sure to read out loud, as this will simulate what it is like for someone to read your survey as if they were seeing it for the first time.

Complete Survey Reports Guide

Get cleaner data and more actionable results with our step-by-step guide to survey reporting.

Get the Ebook

Other Signs to Look Out For

When evaluating your complete data set, there are a couple other items to look out for:

  • Review any checkbox questions with all of the answers selected, including “Not applicable”, “Other” and “None of the above”. This is a sign that a respondent has not read the question, and just selected all possible responses.
  • Nonsense answers to open-ended questions. This can indicate that a computer bot is responding to your survey, and not a human being.
  • Duplicate responses by the same respondent. Sometimes this can be to get duplicate incentives, or this can be a person accidentally taking your survey more than once.

Finally, when cleaning data, take care not to introduce your own bias. When looking at your data you might start interpreting results as the answer to a question you didn’t quite ask. Resist the temptation to tweak the question or answer options after the fact to make the results make more sense.

It is perfectly fine to speculate about possible misinterpretation of the question on the part of the respondent and editorialize in your report or presentation, but changing the actual question is just not ethical.

1Source: SurveyGizmo Market Research Benchmark Guide: 2012, Survey Techniques Survey, Do you clean your data before reporting on it?, n=1,070 Total Sample

Join the Conversation
  • Data cleaning helps to ensure the quality of data and make it more accurate. So i think, it should be must important for companies to perform data cleaning process for databases to improve its accuracy.

  • Data cleaning helps to ensure the quality of data and make it more accurate. So i think, it should be must important for companies to perform data cleaning process for databases to improve its accuracy.