To panic, or not to panic
An article by Bruce Perens just popped up on Slashdot discussing what he is calling a "Cyber-Attack" on April 9 in Santa Clara and Santa Cruz counties in Northern California. A slightly less breathless account of the incident can be read in the Mercury News. They go with "phone outage" rather than "Cyber-Attack" although in this day and age it's difficult to separate the two.
The short version of the story is that unknown intruders opened up a manhole cover, entered the vault beneath, and cut a number of fiber-optic cables, which happened to carry a significant amount of traffic including voice and Internet serving some important locations, such as a local hospital and 911 center (in addition to a swath of private businesses and homes). Some local cell service also went down as the trunks to towers were severed. An unanticipated side effect was failure of software systems internal to the hospital, which apparently required outside connectivity to function.
All this causes understandable concern over the capability of significant public safety providers to deliver services, but I am having trouble seeing how it rises to a new level of threat, as Perens seems to suggest. At the conclusion of his piece, he asks "Will there be another Morgan Hill [the largest town effected]? Definitely," but it seems to me there have already been plenty of Morgan Hills, many with much worse effects than the actual Morgan Hill. Major power outages have disrupted more services, for a longer period, over far larger areas. The September 11th attacks had similar effects on electronic communications systems and exposed systemic failings in IT systems in a similar manner. Were those cyber-attacks?
Perens suggests that the implications of this particular attack are cause for a major re-evaluation of the use of SaaS, VoIP, and any software which relies on Internet connections. I think that approach is excessively reactionary, and ignores many of the redundancy benefits of those technologies that lead us to them in the first place. Consider the many business organizations which cannot afford fully geographic redundant application servers to host their internal mail; Perens' suggestion is that all critical applications and e-mail be based at the business' site, and one is left to assume that this means more than it says, since locating services in a single physical location without offsite backups is already known as an equally bad idea. But should you simply go without e-mail if you can't afford a full main site and a geographically diverse backup, and the staff to operate it? It is the economics of centralization that allows a large measure of the safety that service subscribers already enjoy.
I'm not suggesting there aren't lessons to be learned from outages but I think we're already well on the way to learning them, and that looking backward in fear is not a helpful addition to the process. Morgan Hill reacted well and handled the issue with aplomb, and while there is some truth to Perens' assertion that disaster-prone Western towns are naturally better prepared for these issues, I think he denigrates other agencies and organizations by suggesting they aren't ready for similar outages, whether natural or man-made. I think most are prepared for much worse.
Having said that, his article is full of more practical suggestions for preparation and testing and is worth a read. My own preference is not to avoid new technologies and outsourced services, but instead to focus on independent and redundant lines of communication with which to reach them. This approach is much less reactionary, is less costly overall, and pays much greater dividends in the event of trouble than does basing all services at your own site. Having good reliable external communications is going to provide advantages far beyond simply keeping your e-mail and access to Google Docs up and running in an emergency.
My main problem was that the _hospital_ network could not operate without being connected to the Internet, and they were down to a paper system for the day. This is not acceptable. The hospital must able to run its network and all critical network services in stand-alone mode.
Of course there is a lot of complication regarding database replication and synchronization which I did not place in an article for more general readers.
Bruce
Perhaps, coming from a background in healthcare IT, and having particularly been involved with it during that painful period during which providers first began to move significant patient care operations into it, this didn’t strike me as being quite so unacceptable. It’s unusual today for a hospital to have to go back to paper systems in an emergency, but hardly unheard of, for a variety of reasons. Staff are usually well-trained with contingency systems and IMHO it doesn’t effect quality of care nearly as much as, say, electrical problems (even given electrical backup systems).
Now, obviously someone in IT dropped the ball at the hospital, and hospitals should be held to a different standard of performance than most businesses. But I think you are overestimating the importance of IT in delivering patient care at such facilities. IT is a boon, but it’s NOT critical, and there are certainly other acceptable solutions besides providing all IT services on-site. It sounds like the facility used one that is still very common in healthcare circles. While it’s understandable that you would skip over all the technicalities of redundancy in an article for laypersons, what I really found lacking was a comparison of the risks and rewards, which surely are important to even the non-technical reader.
Scott,
Morgan Hill demonstrated only half of the problem. What the hospital needs to be ready for is an external communications outage combined with a mass-casualty incident.
In triage, we know that some patients can’t wait, some can, and some _must_ wait even though their outcome depends on treatment and on how timely that treatment is. It’s that last category who will be effected by your infrastructure performance. Some may die.
The cost is significant. Not particularly the hardware cost, but the management cost and the additional potential for failure due to increased complexity. Someone has to design and manage all of the replicated services, and there is potential for service outage when fail-over triggers inappropriately. Testing tends to awaken lurking IT problems and thus brings its own potential to create outages.
I might sit down and design a good replicated email package when I’m done with other work – this is an eminently solvable problem with existing Open Source tools. Replicated web applications are easily possible but are complicated by the vast number of web development platforms. It wouldn’t be difficult to do for the Ruby on Rails – MySQL combination, if I wanted to make a start.
Bruce
You’ll get no argument from me on that. It seems to me there are three phases in the deployment of any technology:
- Bang it together so that it works
- Put it together so that it works for less
- Assemble it thoughtfully so that it is robust to the point of ubiquity, for still less money
It’s about time the IT industry started moving on toward phase three. Though I guess phase four might be “Start taking it for granted without realizing you’ve slipped back into phase three” as the phone system seems to have done in places.