Now that we all know that cloud computing is "Multi-Tenancy, On Demand, Scalable, Pay as You Go" computing, it follows that one might want to own only the minimal amount of infrastructure for thier computing needs. Perhaps private data stays on site, but processing peaks can be farmed out to rental CPUs- "Cloud Bursting."
Does this mean data replication or improved data reach. Will we need data storage vendors to replicate data or do we need data storage vendors to facilitate caching (the reach of data.)?
What do data storage providers need to provide to facilitate "cloud bursting"?
Gary - Some good responses already. I think the biggest challenge is not in the storage, physical, virtual, provisioning, provisioning on-demand, general n-tiered storage management, etc. I think most all of the storage management pieces can be taken care of - but what you specifically mention is the "processing". That's the real challenge, mostly because of what happens to the actual data. There are tons of layers of overhead that have to be peeled back before the data can be accessed and "processed" by a program. Encryption has to be decrypted, compression had to be re-inflated, de-duplication had to be re-constructed, and the list goes on. Cloud vendors could cache some of the data and / or retrieve sub-sets needed for processing and warm-it up, but caching will usually present a security hole and you have to deal with another level of synchronization, which gets messy especially with real-time OLTP applications. I think the answer could be found in massive parallelism, using idle CPU time across the customers own organizations CPU's, now the dependency becomes the network and some better known technologies (i.e. Java) that have to come together (parallel processing components and the applications). The bursting of the data could remain resident within the Cloud provider, and processing can be shared across a wide array of PC's that are actively participating in the Cloud's network. It’s a thought.
Good Luck,
Peter
May 2009