OneLogin recently released some major improvements to our user provisioning experience and I thought I’d take some time to talk a bit about the process we went through to develop these.
By way of background, in addition to offering Single Sign On, SAML, Role-based access permissions, multi-factor authentication (MFA) and all those other good application access features, OneLogin has been providing the ability to create user accounts in applications for quite a while now.
With this Provisioning feature, we don’t just offer the ability to say “Bob’s in engineering, so according to these rules and mappings, Bob has access to the following 15 apps” - We take things a step further and reach into the APIs of those applications and make sure we’ve created an account for Bob with all his information pre-populated so he’s ready to go on Day 1.
And this worked great. Occasionally there’d be a hiccup where a user’s application profile data was off, or an email address was already in use, but since it was working 99% of the time, IT was more than happy to have this automation in place since they only had to get involved with some manual tasks to deal with an error once in awhile.
Not your typical Scaling Problem
As OneLogin’s customer base grew, so did the size of our newer customers - and some of our existing customers grew from small start-ups into 40,000 seat global unicorns.
We came to realize as we were on-boarding larger clients, or when our now-much-bigger customers rolled out a new application to all their users, that our existing user experience and processes for user provisioning into various applications just wasn’t scaling.
And oddly, this wasn’t the typical software scaling problem, this was a user experience scaling problem.
Our provisioning system was happily chugging along, pushing out hundreds of thousands of account creations a day but this was producing too many notifications, records and details to our end-users for them to keep up with.
Our old user experience just wasn’t up to the task, even if our back-end servers were.
150,000 users x 1% failure rate = Pain
We started off by gathering feedback from our largest customers, with particular focus on those that are heavy users of our provisioning features (think “150K updates on a weekly basis”)
The number one pain point they reported was that it was difficult to find specific details on just the users that had failed in a provisioning task. Also, they told us they prioritized the failures based on the provisioning activity that was taking place - New accounts were the most important for them to fix, followed up account updates, followed by account deletions.
We also talked through how they were working on clearing the errors once they identified a failure within the current UI. More often than not, the failure was either due to bad user data being pushed into the application and occasionally it was due to a transient problem with the cloud application they were provisioning into.
As part of their process for clearing the errors, they’d make some small tweaks to the user record or mappings to clear up bad data, and then they’d retry provisioning for all their users in the application rather than go through the pain of individually retrying each failed user.
While this worked to clear the problem, they had to wait for all the provisioning tasks to go through before they could get feedback on just the failed users - And to be honest, why would we want them to fire off 100K+ provisioning tasks when they really only needed to do a few hundred? While our Provisioning service was scaling quite nicely, nobody wants to use more processing and bandwidth than they need to.
Scaling the user process
Armed with that information, we came back to our customers with some proposed changes to see if these offered them the functionality they needed.
First and foremost, we proposed the ability to filter the users according to their status - this would provide a way for admins to narrow down the users in an application that had failed on a particular operation.
Next we knew we needed to offer bulk operations on just a subset of users - this would give the same convenience of retrying all the users in the application while limiting this to just those users that actually needed to be retried. We knew this would help more quickly sort out which failures were temporary and which would require more attention. This would only put the users that needed to be provisioned back into our provisioning service queue.
Finally we proposed offering more details at a glance for the errors that had occurred while provisioning the users. While generally any given application would only fail for one or two reasons, we knew that we eventually wanted to expand these UI improvements to offer a dashboard that covered all of a company’s applications. And relating common errors between applications would go a long way to fixing the underlying cause of these errors.
Long story short, our customers loved our proposals and offered some valuable insights as well (for example, the ability to filter by the operation type that was failing)
Rolling it out
I’m happy to say thanks to our awesome Development, Design and QA teams we were able to turn around these UI changes in just a few weeks and the response so far has been very positive.
We’re now looking to expand this functionality as a company-wide view across all applications.
We also received feedback that sometimes the problem isn’t within their OneLogin configuration, but is actually due to bad data from directory sources outside of the OneLogin admin’s control. So we’re working next to streamline the process of tracking down and providing reports on where bad users details are coming from. This will let Admins provide a concise report on bad user data directly to their Active Directory/LDAP/HR directory managers.
How to Get Started
Want to learn more about bulk user provisioning? Click here to request a free demo.