Smoke inside a federal data centre this morning has shut down email, some government websites and even the maligned Phoenix payroll system.
The government’s central IT department, Shared Services Canada, says there was no fire in the Ottawa area data centre — even though there was smoke detected inside.
The centre was evacuated and went into an emergency shutdown that crippled workers across multiple departments, including Transport Canada, where inspectors and special agents were unable to send or receive emails on their BlackBerry devices.
ING Bank’s main data center in Bucharest, Romania, was severely damaged over the weekend during a fire extinguishing test. In what is a very rare but known phenomenon, it was the loud sound of inert gas being released that destroyed dozens of hard drives. The site is currently offline and the bank relies solely on its backup data center, located within a couple of miles’ proximity.
“The drill went as designed, but we had collateral damage”, ING’s spokeswoman in Romania told me, confirming the inert gas issue. Local clients were unable to use debit cards and to perform online banking operations on Saturday between 1PM and 11PM because of the test. “Our team is investigating the incident,” she said.
The purpose of the drill was to see how the data center's fire suppression system worked. Data centers typically rely on inert gas to protect the equipment in the event of a fire, as the substance does not chemically damage electronics, and the gas only slightly decreases the temperature within the data center.
The gas is stored in cylinders, and is released at high velocity out of nozzles uniformly spread across the data center. According to people familiar with the system, the pressure at ING Bank's data center was higher than expected, and produced a loud sound when rapidly expelled through tiny holes (think about the noise a steam engine releases).
The bank monitored the sound and it was very loud, a source familiar with the system told us. “It was as high as their equipment could monitor, over 130dB”.
Sound means vibration, and this is what damaged the hard drives. The HDD cases started to vibrate, and the vibration was transmitted to the read/write heads, causing them to go off the data tracks.
“The inert gas deployment procedure has severely and surprisingly affected several servers and our storage equipment,” ING said in a press release.
There is still very little known about how sound can cause hard drive failure. One of the first such experiments was made by engineer Brendan Gregg, in 2008, while he was working for Sun's Fishworks team. He recorded a video in which he explains how shouting in a data center can result in hard drives malfunction.
In ING Bank’s case, it was “like putting a storage system next to a [running] jet engine,” a source told me.
Researchers at IBM are also investigating data center sound-related inert gas issues. “[T]he HDD can tolerate less than 1/1,000,000 of an inch offset from the center of the data track—any more than that will halt reads and writes”, experts Brian P. Rawson and Kent C. Green wrote in a paper. “Early disk storage had much greater spacing between data tracks because they held less data, which is a likely reason why this issue was not apparent until recently.”
Siemens also published a white paper a year ago saying that its tests show that “excessive noise can have a negative impact on HDD performance”. Researchers said this negative impact may even begin at levels below 110dB.
“It can now be established with a high degree of certainty that the faults in storage systems as a result of an inert gas extinguishing systems discharge were caused by the impact of high noise levels on the hard disk drives,” according to Siemens.
The Bank said it required 10 hours to restart its operation due to the magnitude and the complexity of the damage. A cold start of the systems in the disaster recovery site was needed. “Moreover, to ensure full integrity of the data, we’ve made an additional copy of our database before restoring the system,” ING’s press release reads.
Over the next few weeks, every single piece of equipment will need to be assessed. ING Bank’s main data center is compromised “for the most part”, a source told us.