Azure AD B2C: SLA and estimated availability

Hello again in the Azure AD B2C series!

Introduction

Service availability is a very important factor. Azure AD B2C, being one of the services, has an SLA coming with it. The number is given, it can be included in estimating overall solution availability, the case should be fairly simple. But is it really?

This time the post is not only on custom policies but also on build-in ones and on solutions using AAD B2C service. I planned to write about something different at this point of the series but as the work is already in progress I was inspired by a video on estimating services availability (video in Polish only) by a Microsoft Azure MVP, Marek Grabarz to do something on service availability in the context of AAD B2C.

NOTE
Please remember that custom policies are still not in General Availability and not being covered by SLA so thoughts on them are semi-theoretical and result more in addressing points of concern and giving examples than making any accurate estimates.

What is SLA?

Service Level Agreement and the Agreement is a keyword here. Although it is not the place where you or your cloud vendor wants to be (SLO – service level objective) nor where you actually are (SLI – service level indicator) in my opinion it is relatively safe to assume that the value given by the vendor is a good estimate of how things in average can be. I mean, every vendor who considers itself serious will probably offer a value high enough to stay competitive and at the same time low enough to reflect real capabilities of particular service. Too high value would mean the agreement broken too often and no one want’s that, right?

I understand that this aproach and simplification may prove naive (if not foolish) but I still think it will most often work, at least with providers strongly rooted in the market like Azure, AWS or GCP.

What is the SLA for AAD B2C ?

Well, the documentation states that it’s 99.9% (but you need to read it carefully and check other resources – see Appendix) so if we are going to use AAD B2C as an identity service for our application we need to remember about including AAD B2C service in the service chain and use that value, right? No exactly, at least probably not in most cases with real business.

Let’s assume for a while that you are using the simplest possible approach – the build in policies (excluding ROPC policies, which are in preview). You want users of our system to go through the sign in process and then show them the application which will act accordingly to the identity/access information received.

A very (very!) simplified diagram could look like this:

With estimated availability being:

Assuming that we use a regular App Service with a Standart tier App Service Plan of 99.95% with nothing in addition that would make 99.85% in total. But this is already not true in most business cases. When you integrate B2C with your application you would like the policies properly styled so they fit with your clients or your own corporate branding, right? No problem with policies, you just point to a customizing HTML and voilla. But have you considered where do you host this content?

You need to know that policies which can’t reach the configured static styling file fail with an error returning to the redirect_uri configured during policy launch. So the simple thing like custom HTML/CSS styling for policy sign in form may seriously affect your whole solution availability.

You can go with an uncomplicated solution like hosting it in the Azure Storage having SLA of 99.9%

With estimated availability being:

This may be not possible however if you encounter a requirement of hosting it in a different way, e.g. by an CMS-like application serving UI templates from the database (here you go, now the DB SLA comes into play and possibly also other) and so on.

What is quite peculiar is that in theory you can have different estimates for different flows in a single policy as different flows (e.g. sign-in and sign-up in combined policy) can have different services hosting the UI files. This probably won’t happen in real use cases as whatever hosting there will be for custom UI it probably will be the same service for every policy flow/page. Still can happen though.

The other functionality which may have different availability estimates in the same policy/application setup are external identity providers who may fail to work like any other service. Although you are not directly responsible for them nor are they covered by SLA it is good to be aware of this.

Custom policies SLA

Custom policies haven’t gone GA yet so it is to soon to discuss the real numbers but whatever the SLA would be (and I guess that it will be covered by a standard AAD B2C SLA of 99.9%) they are affected by the same issues as build-in policies and a bit more.

Not only can you use custom UI, resulting in potential decrease of availability but you can also integrate with services external to AAD B2C, like your own REST API feeding the user journey with some data. This works in a similar way as custom UI – a failed request breaks the user journey so if your API becomes unresponsive for any reason then your identity solution is also effectively not available preventing users from authenticating.

What can you do? Well, remember about it and design the infrastructure for hosting your API so it is as highly available as possible in given circumstances. Neglecting it may have very negative impact on your solution.

Let’s take a look at an example:

AAD B2C service is used as an identity service working to protect access to a service. The service is an App Service with an SQL DB data storage. The sign-in process uses a custom policy to pull additional user data to the flow and place it in the token. The user data service is also an App service with Azure SQL. Additionally there is a custom UI HTML file hosted in Azure Storage.

A simple diagram:

And the numbers are (I assume custom policies having a regular AAD B2C SLA of 99.9%):

100 – 0,1 – 0,1 – 0,05 – 0,01 – 0,05 – 0,01 = 99,68 which results in an estimated unavailability of nearly 2.5h every month.

Key takeaway

This post is not meant to be a a very accurate explanation of system availability estimation. It’s purpose is rather to point issues it is worthy to remember about when using AAD B2C in a solution. And the single most important thing to remember is:

Don’t treat your AAD B2C as a single, closed, standalone service with an SLA of 99.9% because it will hardly ever be so.

What’s next

Well, like I planned after the previous post, I already started making notes and drafts on both policy parameters and using Application Insights to track user journeys. This time I will not allow anything unplanned to jump in before publishing any of these 😉

Appendix

AAD B2C SLA page for the service states it is 99.9% but it there is no SLA in the free tier. This can sound bad but actually makes sense. The SLA in general gives you credit percentage off your regular price – discounting from zero (free tier) isn’t much of a profit.

It turns out however, the SLA description is a bit misleading because the pricing says it is not “a free tier” (which would suggest something you choose) but “the time when the service is free” which is when your service is both less then 50’000 users and less then 50’000 tokens issued per month.

This is actually a beautiful example of how you should pay very careful attention to what SLA pages (and other resources) say and how SLA value itself is in the end only a billing-affecting threshold.

Anyway, if you don’t expect this many users or traffic in your application you can only assume that in practice small (and thus free) tenants receive just the same level of service and availability as the paid ones and use the 99.9% value for your availability estimations.

One thought on “Azure AD B2C: SLA and estimated availability”

Tomasz Onyszko says:

7 November 2018 at 06:49

Good stuff. My vote is on APplication INsights and user journeys as this is where people struggle.