Storage Industry: We Have a Problem

In our first post we’d like to make a controversial statement sure to anger many of our readers: the storage industry has failed to meet market needs over the last 10 years. Why do we say that? Because top companies are building their own storage solutions instead of using available technologies.

The most famous and interesting example is, of course, Google. Google has developed its own in-house data storage infrastructure, providing it with a global, scalable, easy–to-manage solution. As far as we can tell, no commercial product today offers comparable capabilities.

Another example is Amazon, which developed its own in-house storage system, later offering it as the S3 service. And there are more examples.

These companies did not develop new network hardware in-house; they used existing off-the-shelf technology from CISCO and other networking companies. They did not develop their own operating system; they used Linux, or some other operating system. They did not develop new motherboards or processors; they used the best available options on the market. They did not reinvent a new network layer; they used TCP/IP.

We can already hear your objections, all variations on the theme that “Google did not implement a generic storage solution — they implemented a unique solution that tightly integrates storage and application.” That is true, of course, since when one has to develop one’s own solution, it is tuned to one’s own requirements. If Google had been forced to develop its own operating system, we are sure that it would not have been a generic, all-purpose one, but designed for its own needs. But they found that the current offerings in the operating system world were “good enough” for them, and they used the generic technology. The same applied to network hardware, network protocols and other technologies; current market offerings were “good enough.”

Are generic storage solutions adequate to the needs of these giants? We think not. Take everything we know about these solutions, and ask storage administrators who manage one petabyte or more (as well as most of the storage administrators who manage 50TB or more), and we are sure they would be delighted to pay astronomical prices for an appropriate solution, but no salesperson has ever knocked on their door and offered one.

So networking technology has been catching up to customer needs, operating system technology provides adequate solutions, server virtualization technologies are providing amazing benefits, but storage lags behind.

Where did storage fail? Here are some guesses:

  • Scalability and global name space: there is no practical solution for a global name space in the SAN world, spanning multiple systems. When a volume is migrated from one system to another, hosts must be configured, switch zones must be set, and application downtime is required. NAS is getting a bit better with time, but still nowhere as good as networking technology.
  • We still have to assign logical entities to physical entities, that is, volumes to controllers/spindles. The server virtualization companies have shown us how space, power, manageability, and utilization all improve when one virtualizes the logical server over a bunch of physical resources.
  • Management standards have not matured yet, and there is no standard interface that allows easy management of heterogeneous environments.
  • Host software is still proprietary and lagging behind. Microsoft’s blessed VSS and MPIO frameworks are a step in the right direction, and we really miss a VSS-like solution for Unix environments.
  • Disaster Recovery (DR) methodologies are still project/solution oriented and not a packaged product. Although everyone needs one of a limited number of DR solutions, everyone has to implement an in-house integration project.

So what’s next? We don’t know. All these issues require new technologies or new standards, or a new industry-wide standard. We don’t know who will implement them, but we’re sure the market really needs them.

10 comments to this post:

  • Welcome to the blogosphere :)

  • Moshe

    Congratulations on joining the blogosphere. Your comments about failing the industry are interesting. I think that storage meets 80% of companies out there and the issues relate to the other 20% (which may be 90/10 or 95/5, I don’t know). Only the big guys like Google and Amazon with large distributed, highly data driven organisations are the ones with the problem, in particular those with 100% of their business being web-based. Many organisations (think financial) have no desire to have so much data heavily distributed due to the risk profile that brings. In addition, their demands on data don’t allow there to be multiple semi-synchronous copies in existence.

    I understand the position you are taking but I think you’re slicing only one part of the IT industry and each vertical has a different requirement.

    Good to stimulate debate though!

    Regards
    Chris

  • Welcome indeed! As a fellow storage blogger, I can tell you there’s a nice combination of camaraderie and competition here, and your input is most welcome. Your name certainly gets thrown around enough!

  • Moshe: I’m looking forward to reading more on this. I understood and agreed with most of your post… up to the DR part:

    “Disaster Recovery (DR) methodologies are still project/solution oriented and not a packaged product. Although everyone needs one of a limited number of DR solutions, everyone has to implement an in-house integration project.”

    Could you expand on this? How else would you approach this? Cloud DR?

    -James
    StorageMonkeys.com

  • Answer to Chris Evans: It is true that Google and Amazon have unique needs which are different from most other customers. Nevertheless, networking technology, operating systems technology and other infrastructure technologies scaled well to their needs, whereas storage did not.

  • Answer to James Orlean: Today, when customers want to build a DR solution, they have to take care of the network, client, server and storage layers, with very loose integration between them. Someday, we hope there will be an integrated solution that handles all layers. We don’t yet know what approach will do this, we only know that current solutions fall short.

  • Perhaps you should take a look at VMware’s SRM (Site Recovery Manager) that offers DR automatic failover for Virtual Machines and their storage, its even supported by IBM storage.

  • This is a very interesting problem statement because you guys wrap both operational and business issues into this statement. I will stay tuned to more on this.

  • well, this is spoken from my heart.

    one comment on pricing:
    while in every it business prices go rapidly down,i think storage is still astonishingly expensive (i.e. TOO expensive), especially when you want to scale and want things like mirroring and that stuff.

    so, compare the pricing of a single TB disk to the pricing for the same TB provided by a nas or san storage system. it`s typically one or two orders of magnitude above that and this simply sucks. even pricing for software-only solutions like datacore simply sucks.

    think we need some revolution in the storage business. thin provisioning, dedupe, mirroring, snapshots - that needs to go mainstream.

    look at the new features of zfs and port zfs to linux. that would induce some movement into the fossilized enterprise storage market…..

  • [...] have to agree with a lot of Moshe’s statements, that the storage industry hasn’t exactly cracked the code of commodity storage in the last [...]

Add comment
Send

Subscribe without commenting:

Get notified by e-mail whenever comments are added