This post will explain how to achieve REAL high availability on ADFS (Active Directory Federation Service) using Azure Traffic Manager (without load balancer). The same setup is possible using AWS Route 53 instead of Traffic Manager.
Azure Traffic Manager represents a very simple and solid type of fault tolerance. Once you set it up there are not many components to manage as it’s driven by DNS CNAME mapping. The setup is also very useful to perform seamless server management, as you can easily perform failover to one ADFS/WAP string in order to perform upgrades on servers in the other string.
Basic Redundant ADFS Setup Using Two ADFS/WAP Strings
To securely expose ADFS to outside clients we will need to install WAP (Web Application Proxy) on a DMZ server. Similar to ADFS, WAP is a built-in Windows component. A step-by-step guide on installing WAP is found here.
In order to achieve high availability on our setup we need (at least) two strings of ADFS and WAP servers. In front of this we need some sort of mechanism to ensure communication is routed to the healthy ADFS/WAP string in case one of them is down. In this scenario we will use Azure Traffic Manager The basic communication steps are as follows:
- The client contacts third party cloud service which requests the client to obtain an access token from ADFS
- The client resolves the ADFS server DNS name in Traffic Manager which, using a DNS CNAME record, maps the ADFS server name to a healthy WAP server name
- The client (resolves the WAP server name to an IP address and) contacts the WAP server to obtain the access token for the cloud service
Ideally the ADFS/WAP strings are placed in separate availability zones to ensure survival in case of breakdown of a complete availability zone.
Inner Workings of Azure Traffic Manager
To get a better understanding of the logic inside Traffic Manager, let’s zoom in and see how it leverages DNS CNAME record mapping and health checking to achieve high availability:
When a client asks Traffic Manager to resolve the DNS name of the ADFS server, Traffic Manager will only return the DNS name of the healthy WAP. If both WAP’s are healthy or unhealthy Traffic Manager will randomly return either of them.
The official Microsoft guide for setting this up is found here and this could be the end of our post. Unfortunately the recommended setup from Microsoft has a really big problem as many people found out the hard way e.g. this guy.
Azure Traffic Manager Health Probe Is Not… Healthy!
Or actually, the health check itself is fine but the recommended health probe setup on the ADFS/WAP side is not. Let’s look at how Microsoft has implemented the health probe:
Both the ADFS back end server and the ADFS WAP server are listening on two ports: The actual ADFS communication port (TCP 443) and a probe port (TCP 80). Inbound requests are processed as follows:
- The ADFS back end server responds with HTTP 200 OK to requests for /adfs/probe on port TCP 80 to show it’s alive (says nothing about WAP)
- The ADFS back end server is processing inbound ADFS requests on port TCP 443
- The WAP server responds with HTTP 200 OK to requests for /adfs/probe on port TCP 80 to show it’s alive (says nothing about ADFS back end)
- The WAP server is proxying inbound TCP 443 to the ADFS back end server (also TCP 443)
The problem in this setup is that the probe on the WAP server only reflects the health of the WAP server and not the ADFS back end server. As Traffic Manager only checks the health of the WAP server it will redirect clients to a healthy WAP server with a faulty ADFS back end server. Which can’t really be considered high availability:
Magic Trick To The Rescue
Fortunately, working for a client, we came up with a neat and simple workaround on this issue which gives you true high availability with Azure Traffic Manager.
The trick is to create a new forwarding rule on the WAP server on a random port, e.g. TCP 81 and then proxy it to the probe port TCP 80 on the ADFS back end server. You then configure the probe in Traffic Manager to point to this new port, TCP 81 on the WAP server. The probe will render unhealthy in Traffic Manager if the WAP fails to proxy the request (WAP is down) OR the ADFS back end server does not reply to the probe request (ADFS is down). This effectively renders your ADFS/WAP installation truly fault tolerant.
The setup in WAP looks like this:
(The reason why you see both probe redirects in the WAP configuration is due to the fact that you configure both rules on the primary WAP server. All other WAP servers receive the configuration via AD replication.)
The Traffic Manager configuration is seen here:
Make sure you test the setup thoroughly by shutting down each of the nodes in turn. Check that the health status is reflected in Traffic Manager and proper failover takes place. You test Traffic Manager by making consecutive DNS requests and verifying which DNS record is returned. Use NSLookup.exe or PowerShell (Resolve-DNSName).
If you’re using AWS for cloud services you can easily use AWS Route 53 instead of Traffic Manager to achieve the same setup. Just keep a few things in mind:
- You must create an AWS Hosted Zone and we recommend delegating a subdomain, e.g. aws.yourcompany.com to AWS
- Health Check is a separate component as opposed to Traffic Manager where health check is an integrated part of the Traffic Manager properties
Apart from this the concepts are very similar and it also works well once you get it right.
A sample AWS Route 53 configuration is seen here:
And that’s it for now… We hope you’re able to use the information in this post to either correct your existing configuration to obtain true high availability or to make a brand new fully redundant ADFS setup.