August 13, 2012

I survived the crash of ’97

1997 seems so long ago now but the lesson from one of our client experiences is still relevant today. One day I received a call from one of our ERP software clients who found themselves in a desperate situation. Their redundant backup system had failed them and they were looking for help. The company had invested in a server with RAID 5 architecture that also had each drive mirrored. As many of you probably know, if one of the RAID 5 drives failed, they should have just been able to replace the drive and it would be automatically re-built. With the drive mirroring, it should also have been possible for the backup mirrored drive to take over from the original drive. In this situation, one of the drives still functioned but was writing garbage, which was then duplicated on the mirrored drive. Both hardware backups were rendered useless. Next the company went back to their tape backup, which was faithfully run every night with a complete backup and verified by someone in the accounting department who inspected the backup logs each morning. Unfortunately, the tape backup was useless. About the same time the person responsible for IT had left the firm about eighteen months previously and not been replaced because of downsizing in that department, the tape backup itself had come back from repair. The repair obviously hadn’t worked as the tape heads were wandering up and down randomly across the tape. Although the backup logs said the backup had worked, the tapes were useless. We tried sending them to a data recovery specialist but they were unable to recover any information. This was a rather complex business involving service, inventory, and manufacturing and we really had no idea what we were doing to do next at that point. We were able to find a close to two year old CD backup that we had created as a test for a reporting solution and happened to still be in my desk drawer. This gave us a head start on populating master files such as customers and inventory items and gave us historical financial information up to that point in time. Everything else had to be re-created manually from paperwork and took months to do. Fortunately, the company was in a relatively strong financial position and was able to survive, unlike many companies. A recent article that I read suggested that the majority of companies who go through something such as this shut their doors an average of six years after the accident.

After everything was done, we had a celebration and t-shirts were handed out with the phrase “I survived the crash of ’97” printed on the front. I could go into numerous details here about what could have been done differently but I will keep it short. Make sure you test your backups on a regular basis, ideally on a backup duplicate server and backup system that you keep offsite. You should be able to take your offsite backup information and restore your system within twenty-four hours. And make sure you talk to someone who knows what they are doing so you can devise a proper disaster recovery strategy. That same article, by the way, said that companies that had gone through this type of experience were willing to pay almost anything to make sure they did not find themselves in this situation again.