SaaS takes a dive

It's been a while since Salesforce.com had a significant service disruption, so it made the news that several of their server clusters were out, on and off, over the course of about six hours yesterday. Time was when this was a regular feature for the Salesforce user, an opportunity to grab a cigarette break or a cup of joe and shoot the breeze for a while until the Salesforce gremlins could be excised. But for almost a year, service has been reliable and responsive at the SaaS CRM company.
Some people think the current outage is a big deal, others not so much, but the interesting part for me was that Salesforce's distant second competitor, Salesboom.com, was experiencing outages at almost the same time. This didn't get coverage anywhere that I could find, and in truth I would never have known myself if a client had not been in the process of rolling the service out right as it was happening.
The twin episodes illustrated two things for me: One, it's clear why Salesforce is the industry leader, with their constant updates and clear communications via the dedicated monitoring site http://trust.salesforce.com/; their response was open, professional, and very effective at quelling discontent (witness the non-reaction at Salesforcewatch), while Salesboom simply stopped responding to support e-mails or returning phone calls. Two, it calls into question whether or not any SaaS product can deliver the bill of goods that the proponents (and I am one of them) have been selling. As Michael Krigsman (also linked above) points out, "SaaS customers pay to be insulated from release problems, and that's what they should get. With no reasons, excuses, or explanations." While I don't believe that SaaS is somehow magically immune to all the issues endemic to software development and deployment in our world today, it's true that it has been billed as making those issues Someone Else's Problem. Technically that remains true (if you don't like Salesforce, you can always move to Salesboom! and vice versa), and it's also true that in-house deployments still suffer these problems in spades, but technicalities don't make fuming sales staff any happier as their work piles up in front of them.
I am still confident the world is moving primarily toward a SaaS existence, but the question remains open as to whether or not we will learn to be as forgiving of glitches in Someone Else's System as we are in our own, or whether higher standards will apply (and if so, if they can be met).
This incident highlights two key aspects for companies who provide SaaS:
(1) The quality of the service is a direct result of the quality of systems management. Clearly, Salesforce.com has done a great job in this area, which allows it to distance itself from Salesboom.com by alerting, reacting and dealing with problems where Salesboom.com simply went into blackout mode.
(2) In spite of the superior service of Salesforce.com, the incident troubleshooting (which is described quite well in the service status portal http://trust.salesforce.com/trust/status) still takes time. Some incidents took 2hrs and still there was no clear root-cause). It’s pretty safe to assume that this troubleshooting requires a high level of expertise. Salesforce.com could benefit and distance itself even further from the competition by taking a more proactive and predictive approach to systems management, so they can foresee service-impacting issues as they “brew” and avert their impact before it is too late.