Late November saw Microsoft Azure hit by a major global outage, disrupting enterprises worldwide who’d shifted their workloads to Microsoft’s public cloud. Local Australian businesses were some of the only ones not to see their websites, storage and critical apps like email go down – at least this time.
Let’s say you’re one of these Australian IT leaders who’s done the right thing and moved your organisation to the cloud. Have you considered what happens to critical IT systems like email when your service provider has an outage? How will your organisation keep working? Answering these simple questions can be both complex and troubling for many enterprises. It is delaying many from moving to the cloud completely – yet it doesn’t have to.
IT leaders must hold the cloud to the same standards of continuity, compliance, and security as they would their on-premises infrastructure – or risk incurring major financial and reputational damage. And it is worth noting that often organisations have limited legal or financial recourse when an outage occurs from their provider.
The cloud’s value proposition, however, is its “no worries” approach to IT infrastructure: fully outsourcing your systems and services to benefit from major economies of scale. So how can IT leaders get the reassurance they need? The answer is familiar – don’t put all your eggs in one basket. For years IT leaders have built expensive back-up systems on-premises and the cloud requires the same basic redundancy: so build a blend of overlapping cloud services that give you that critical plan b when your primary service is not available.
Failing to plan
Public cloud services typically offer minimal protection in the event of an outage. And when an outage occurs, the damages can be swift and widespread. Aside from last week’s global Azure outage, Exchange Online, Adobe, local hosting provider Ninefold, and Google’s public DNS have all experienced significant outages this year, leaving end-users unable to access data, apps, and even (in the case of Azure and Google) large numbers of websites. For businesses relying on the cloud for critical services like hosting and mail, cloud outages typically result in high volumes of customer and employee backlash, which IT has to deal with while simultaneously fixing the problem. Yet even when it comes to SaaS and private cloud services, internal IT managers often have little recourse apart from retrospective SLAs that rarely cover the loss of productivity and sales incurred.
While the real world costs of a cloud outage can be significant, fixing this has remained a relatively low priority on the IT manager’s agenda. This is a mistake. Testing cloud services for downtime, failover processes, and other critical elements remains rare; yet the same testing would be de rigeur for on-premises systems. In essence, enterprises are playing Russian roulette with their IT infrastructure. What’s going on?
One reason is the “no worries” view of cloud services that vendors and IT managers like to believe. The main drivers to cloud remain cost-effectiveness and agility: in our industry discussions, we often hear about a “she’ll be right” attitude to disaster recovery or continuity, which always seem unlikely until they happen. A continuity strategy based on this hope it won’t happen to me won’t cut it with the boardroom when critical IT systems are unavailable.
Another reason is that IT managers lack visibility into how cloud services operate: even a seemingly functional continuity solution may, for example, have gaps in data restoration which an IT manager might not notice until it’s too late.
Agility is the point of the cloud, but you can’t be agile if you collapse during a disaster. As IT professionals start to go from “lights-on” maintenance roles to integrating services and consulting their executive counterparts, they’ll need to champion a new approach to risk and compliance: the service portfolio.
Many cloud services are better than one
Why should you combine cloud service providers? Most enterprises come unstuck because they only rely on a single cloud platform or environment – and everyone gets downtime at some point or another. If all your data or apps are hosted in the one place, your organisation is fully at the mercy of external factors. This all-or-nothing approach to availability isn’t sustainable: in fact, it’s the opposite of how IT managers have traditionally approached on-premises systems design, where redundancy and failover is built and tested in every component.
IT leaders need to translate that on-premises rigour into the cloud. First off, they should test potential cloud providers against new and existing compliance benchmarks for factors like availability zones, backup processes, and real-time visibility when an outage occurs (Microsoft, for example, now publishes quarterly statistics on Office 365 availability). They should look to use not just one provider, but at least two which rely on entirely different infrastructure. Mimecast’s email management cloud, for example, runs in entirely different datacentres and networks to the Office 365 and Exchange services which it supports, meaning no overlap of factors which might lead to downtime.
Just as with all risk-related decisions, IT leaders will need to work closely with their executive counterparts to figure out which services would have the most impact on the business if they were lost, and put in place a strategy to ensure continuity. Email typically ranks at the top of this zero tolerance for failure list.
A lot of our customers now approach email continuity in the same way as e-retailers have been approaching peak uptime for their websites: the dramatic recovery of Click Frenzy, which crashed in its first year due to demand but has performed without a hitch since then, shows both the feasibility and benefits of holding the cloud to on-premises standards. As IT professionals take on a more advisory and consultative role in their organisations, they’ll be exposed to a range of business requirements that inform where uptime and disaster recovery are needed most, and why.
What happens when the cloud service goes down? Every IT leader should be able to answer that question immediately and show their continuity strategy. A strategy based on planning and technology not hope. If you have that then it really is no worries.
Nick Lennon is the ANZ country manager of Mimecast