Duplicate data: the bane of cloud computing?

© One_From_RM
IBM announced last week that they have acquired Diligent Technologies, a Framingham, Massachusetts based company which specializes in developing software for de-duplicating backup data, improving backup times and reducing the storage capacity requirements. In large environments, despite the relatively low cost of storage, this can result in substantial savings.
It got me thinking, however, about duplication of data and the challenges of reducing it in cloud-based Computing environments. Data duplication is no small matter in the enterprise: at best, duplication simply consumes extra storage space (which, as noted, is still cheap); at worst, it results in out-of-sync copies of information which can disrupt operations or cost substantial amounts to de-duplicate. Solutions such as that presented by Diligent are able to prevent a considerable amount of duplication… but in-line processing isn't really an option in the cloud model, at least not at this point. Conventional de-duplicating processes, which are either manual or batch-oriented, can of course still be run. But for the same reason that duplication is more expensive in the cloud, so is the execution of operations to prevent it: storage costs more and so does processing capacity.
This isn't an issue that is enough on the radar to significantly effect the adoption of cloud computing models, but it's one more thing to consider if you are.