Like other early adopter Asia Pacific countries, New Zealand has a high level of virtualisation, and its businesses are widely reaping the benefits, from cheaper, greener IT to improvements in IT responsiveness. However, these benefits are not without risk.
Virtualisation benefits derive from the consolidation of workloads. However, consolidation increases exposure because all of a business’s eggs are in one basket. Given this, effective disaster recovery is a core component to ensuring adverse circumstances are mitigated with minimal damage to the business, its reputation and its bottom line.
At this stage of the maturation of virtualisation, we can take for granted that effective disaster recovery is integral to business continuity. This should not be new information.
For businesses new to virtualisation, the old physical methods of backup and recovery can prevail. However, old habits die hard, and people need to move away from the old model and start thinking about disaster recovery in a virtualised environment and have plans and tools which are tailored to its specific constraints.
It is not enough for businesses to simply implement a backup tool and assume the work is done. In reality, businesses need to develop a comprehensive plan for disaster recovery as well as a nuanced understanding of their own virtual environment, its fallibilities and the threats posed to it.
Four key components define a comprehensive approach to disaster recovery. This approach best positions a business to mitigate downtime to their virtualised working environment.
Avoid an outage
The first step in minimising downtime to your virtualised network is to avoid an outage in the first place.
While outages are traditionally thought of as beyond the scope of a business’s control, the use of monitoring tools and the analysis of knowledge of the environment can help a business to dodge the bullet by way of prediction.
Hardware and software often give signs that they might fail. For example, they will write errors to a disk, rapidly fill data stores or have high CPU wait times. Heed these signs.
When they do occur, a business can make the choice to perform a planned failover. This has much less of an impact than the unplanned downtime an actual outage will bring.
Understand the impact of an outage on your infrastructure
A business should have a thorough understanding of the impact an outage will have on its infrastructure. Before a planned failover (let alone an actual outage) occurs, a business should have already predicted the specific workloads that can successfully be assigned. It should also know the impact those workloads might have on surrounding infrastructure.
An issue many businesses confront is that they can only perform a failover on a subset of workloads because hardware in the disaster recovery is either not comparable with production or is being already utilised by other virtual machines.
One effective way of getting around this problem is to plan a ‘capacity shift’ which allows a business to select workloads which can be powered on disaster recovery without causing further outage.
This kind of planning and activity is critical to determining which workloads can actually be brought up with priority in the event of a failover.
Know your virtual environment
Capacity planning for a failover is the kind of activity that gives a business the necessary intimate knowledge of its infrastructure.
This knowledge is important and a business should take regular measures to ensure it has accurate and up-to-date information about the exact configuration of its virtual environment. Mapping out the virtual environment at a host, storage and network layer is a powerful exercise. Tools like Visio play a critical role in a disaster recovery scenario as they allow quick, graphical analysis of the playing field before and after any outage.
Do an autopsy
Once a business has prepared for a failover and has enough knowledge to be able to identify what changed in the production environment to spark it, it must be ready to do an autopsy. This will involve going through audit events to identify user and system-generated changes, and analyse performance statistics to see if failing hardware (or something else) caused the outage.
This is not about apportioning blame, but rather about understanding the circumstances that brought about the outage so that it can be avoided in the future when all the workloads are ‘failed-back’ into production.
Ultimately, developing a disaster recovery strategy requires more than just implementing a backup tool. It requires planning and forensic knowledge of your business’s virtualised landscape.
Virtualisation downtime can bring a business to a grinding halt and recovery has implications not only for lost data, but lost business and a loss in reputation.
The reality is that every organisation needs to define and, importantly, continually refine their disaster recovery strategy. When considering disaster recovery in a virtualised environment, the old adage, ‘failing to plan is planning to fail’ has never been more applicable.
Charles Clarke is the systems engineering manager for Veeam Asia Pacific