Have Questions? (800) 609-6480

Diagnosis of Last Week’s Downtime

Posted by 1 Response Filed in: News

Last Tuesday, SurveyGizmo experienced a severe server failure resulting in an hour of downtime. In our minds, this is completely unacceptable. You deserve an explanation of exactly what happened, what we are going to do about it, and how you can continue to feel safe with SurveyGizmo.

First, please know that SurveyGizmo has been fully functional and stable for data collection and surveying since 1:20p.m. Eastern Standard Time, Tuesday afternoon. Also, we have contacted everyone for whom we had additional data or information.

So what happened last Tuesday?
We were down from approximately 12pm – 1pm EST (1 hour and 15 minutes). Our primary database cluster locked up because of a corrupted database table and a flood of simultaneous connections. The database table in question was not part of SurveyGizmo’s application itself, but rather a component of our content management system used by marketing and our website. However, once the connections backlogged, they began to effect all database connections across our servers — including our primary application.

After consulting with our database administrators and our customer support manager, we decided to switch over to our backup databases and ran on backup until later that evening. We then took the site offline intentionally at 7pm EST to switch back to the primary databases. We sent out an email notifying everyone that we needed to take the server down for 15 minutes, but we were actually able to switch back in under 6 minutes.

Impacts on your survey & data collections
- Surveys could not be served during the downtime, but resumed normally once we came back up
- If your users started taking a survey prior to the downtime, they were able to finish their survey fine without noticing anything out of the ordinary.
- No data was lost, though in some cases it was delayed until we restored it from our redundant backups.
- The only problems that some surveys takers would have seen was if you were using one of our API integration technologies (ExactTarget & Salesforce) in that survey. Those require the database during the processing of the page they are on. So only if you were using ExactTarget or Salesforce actions would this possibly be an issue. Currently, those responses are in a Partial status and could be completed now. Contact support if you think this might have affected you.

What we are doing for the future

We are meeting with our DBAs and hardware vendors this week to begin implementing an automated fail over and high availability solution in our datacenter. While this will take several weeks to implement it will protect us against this type of failure by automatically cycling database servers with table issues out of use until the issue has been resolved on that server. All of this will be seamless and go unnoticed by you or your clients as it will happen automatically, rather than the manual process that had to be used in last week’s incident. We will blog more as soon as that configuration is up and running.

In addition, we will increase the frequency and the complexity of our own fire drills. To ensure that everyone on our staff is comfortable with switch overs if they need to occur.

Thank you all for being so patient and understanding during our downtime. Any downtime for us is very rare, and we take it quite seriously. More than anything please accept our apologies if this outage caused you or your clients any frustration.

Sincerely,

Christian Vanek
Co-Founder & CTO
SurveyGizmo

Share this:

  • Geneviève Gélinas

    Thanks for the report and explanation of the measures that are being put in place to counter this problem in the future.

About the Author

Christian Vanek
Christian is the CEO and co-founder of SurveyGizmo. Before building SurveyGizmo 1.0, he came from an 11-year consulting background focusing on marketing and content management tools. When not working on new ways to gather data, he spends time developing games and actively supports innovative youth education programs. In spite of living in Boulder, he does not ski.

Start your free 14 day trial

Get all features for $50/month. Add users for $20/month each.

Questions? Call us anytime during your trial at (800) 609-6480 ext 1.

No credit card required.

  • (800) 609-6480 ext 1
    for help setting up larger teams.

* Extra users are free for the duration of your trial.
You can change the number before upgrading to a paid plan.

Best of SurveyGizmo Weekly

By . In Know How.

Likert Scale – What is it? When to Use it? How to Analyze it?

April 24 2012 -

In all likelihood, you have used a Likert scale (or something you’ve called a Likert scale) in a survey before. It might surprise you to learn that Likert scales are a very specific format and what you have been calling Likert may not be. Not to worry — researchers that have been doing surveys for… Read More »

By . In Interviews.

How One Company Beat All Odds in Conducting An Offline Survey In Africa

World-Wize Surveys used the SurveyGizmo API to build their own iPad survey app. Want to know how? Read on.

By . In Best Practices.

How to Get A Raise By Creating Surveys You Can Act On

The most successful survey creators know that creating a survey starts with a solid plan, before you even begin building your survey.

More from our Survey Experts