We’ve seen it many times: an organisation believes their data’s safe, because they carefully setup backups and now diligently monitor them for completion.
Unfortunately, that’s only half of the story and, unless that backup’s tested, it could take significantly longer to recover than planned: sometimes days. Or Never.
Without testing your backup and restore plans, full recovery may not be possible at all. For example: nuanced systems such as databases and their transaction logs need careful analysis, to ensure that restore scenarios are fully considered; without testing, there’s little reason to assume that the restored database will contain the same records as the one which crashed – and this ignores Recovery Point Objectives and Recovery Time Objectives.
RTO and RPO
Usually, we discuss these with the people who use and ‘own’ the individual systems within your organisation, to determine what RTOs and RPOs they need – and how long they can run before the cost of not meeting these objectives outweighs the costs of reducing them.
Recovery Point Objectives
How much changed data can you afford to lose – 900 milliseconds, 2 minutes, 3 hours, 4 days? We analyse how frequently backups run and which solutions need to be in place, to prevent the organisation from losing more than it can cope with.
Recovery Time Objectives
How long will it take to restore ‘service’ (to the Recovery ‘Point’ above); perhaps this could take a week, in the case of rarely used archive data; 1 hour if it’s preventing the accounts department from raising invoices or mere seconds if operational governance or profitability demands it.
The cost of reducing RPOs and RTOs
In recent years, the costs of reducing RPOs and RTOs have dropped hugely, however it’s still true that there’s an exponential rise in cost as you approach zero (i.e. no loss in service or data, following a failure) – just ask high frequency traders, where nanoseconds may count.
Test, Test and Test again
Running a restore test is the only way to determine what will happen, rather than what you suspect, or hope, will happen. This way, the process can be refined, improved, documented and repeatable, for when it’s needed urgently. If it’s critical, our advice is to Test, Test, Test.
Disaster Recover and Hyper-V Replication
One solution is High Availability: once the costs dictated this was Enterprise level only, however Intersys now regularly configures ‘replication partners’, where we setup and maintain continually synchronising multiple copies of individual servers, automatically maintained as ‘verbatim’ copies, which can be ‘failed-over’, in order to regularly test backup plans.
This can be done by using Microsoft Hyper-V replication, or VMWare vSphere Replication but, as always, there’s not much point if it’s never tested, so we like to schedule these for our clients, to help ensure they can sleep well at night.