How OneLogin maintained 100% uptime during the Dyn DDoS attack

October 25th, 2016   /     /   smarter identity, product and technology

On Friday, a DDoS attack on Dyn, a major provider of DNS services, made many websites unavailable, including Spotify, Twitter, Reddit, and The New York Times. Fortunately for our thousands of customers, OneLogin remained up during this attack because of three values that we practice:

  1. Culture of Integrity
  2. Investment in Redundancy
  3. Continuous Deployment

Culture of Integrity

At OneLogin, we display our actual uptime on our trust page. As I write this, it’s at 99.997% for the past 12 months. Obviously, we strive for zero downtime, but realistically, things do happen that can impact uptime, and we learn from these incidents to improve our service. Obfuscating the impact of these incidents to our customers is not fair to them and doesn’t help us improve as a company. By maintaining a culture of integrity, we build trust with our customers and drive ourselves with a mantra of continuous improvement

Investment in Redundancy

Holding ourselves accountable for uptime leads us naturally to building redundancy into our architecture. Simply put, redundancy means that if one component fails, another can take over. Some steps we take to provide high availability are:

  • We use multiple DNS providers. So, when Dyn came under attack last week our other provider NS1 automatically handled requests to our service.

  • Our service runs in multiple datacenters on AWS, both in multiple zones and multiples regions.

  • Our Active Directory Connector lets you run multiple instances for higher uptime.

  • We provide redundant sites for service status since DNS attacks could potentially render our trust page inaccessible. These are onelogin.status.io for our US service and onelogineu.status.io for our EU service.

Investment is a key concept here. All these architectural choices cost us time and effort to deploy and maintain, and we are not done yet. These efforts are worth it because we know how critical our uptime is for our customers.

Continuous Deployment

Agility is crucial for a modern engineering organization to respond quickly to bugs, security vulnerabilities, or the next attack on Internet infrastructure. For this reason, our team and processes are set up to deploy new versions of our service on short notice. We regularly deploy several times a week, and sometimes multiple times a day.

Conclusion

When your identity provider is down, your business is down and financial losses may be incurred. Besides losing money, your reputation is marred, which erodes your brand.

Even though the recent DNS attack did not impact our customer base, we remain vigilant to ensure that our service remains available during future attacks, through rigorous accountability paired with thoughtful technical investments and agile practices.

If you’d like to speak with us further or see our solution in action, please contact us.

About the Author

Dragan Milanovich is Vice President of Engineering and Technical Operations at OneLogin, and specializes in building high availability scalable systems. His many years of experience include VP and Director level positions at HP, Palm, PayPal, and EBay, where he developed Mobile and Cloud Service infrastructures that serve millions of users.

View all posts by Dragan Milanovich