Throttling Up Performance: More Drives or Faster Drives?

“Disk drives are the key component driving any storage system — a system can perform only as fast as its disk drives.”

While true, this kind of thinking can lead to the wrong conclusion… especially when the issue is one of the industry’s most debated: 15K RPM vs. 7200 RPM drives (also known as FC and SATA drives, respectively).

Engineering and IT storage architects tend to have a common line of reasoning on this topic, namely:

1. 15K RPM drives give more performance than 7200RPM drives

2. Some data needs high performance, some doesn’t

3. Let’s find a way to put data requiring high performance on 15K RPM drives and the rest on 7200RPM drives

About now, you may feel like countering with: “You’re missing the point. SATA drive reliability is the real issue.” True, reliability needs to be addressed; we’ll cover it in a future post. Today, let’s focus on drive performance.

The traditional thinking about disk drives and their speed limits has led the storage industry to create a myriad of solutions, paradigms, and products. Underlying them all is the truth that prioritizing applications and their data is a daunting task. For instance, to classify today’s email as critical high performance data? Certainly. But what about last month’s email? Archival storage is enough. So what do we wind up doing? Adding complexity to the email application by differentiating its storage just to accommodate the performance issue?

Here are a few paradigms that storage teams apply trying to do just that:

  • Maintain different storage system tiers, with different tiers for different applications
  • Maintain different disk types within the same storage systems, assigning different volumes to different tiers
  • Automatically migrate storage blocks between different disk types within the same system
  • Migrate data at the application level from volume to volume, based on its importance

But these storage acrobatics beg the question: Is there really an issue about getting high performance out of 7200RPM drives?

Let’s look at the pure technological performance differences between 15K RPM drives and 7200RPM drives. There are two:

  1. 7200RPM drives can perform about 100 transactions per second, while 15K RPM drives can perform about 200 transactions per second.
  2. Latency of a 15K RPM drive is about half the latency of a 7200RPM drive.

Now let’s consider how this translates into the real life needs of a storage system — not an individual disk — starting with transactions per second. Take a specific storage system; say, one with 100 disk drives. A 15K RPM-based system can obviously perform twice the number of transactions as a 7200RPM system. But what if we use 200 7200RPM disks instead? The answer: We get the same transaction count and about six times the capacity. This, because any 7200RPM drive has about three times more capacity than a 15K RPM drive.

Is this a good approach?

  • Capital expense-wise: 7200RPM drives are about one-third the price of a 15K RPM drive. So these two systems have roughly the same transaction count for the same cost. (This, of course, relates to the storage vendor’s cost; a storage vendor’s price might be a completely different story. But let’s assume the margin is fixed.)
  • Capacity-wise: We get six times more capacity.
  • Operations expense-wise: We need about double the power and floor space but get a whopping six times more capacity, so are three times more efficient.

So, at a third of the capital expenditure, we get the same transaction count, six times more capacity, and are three times more efficient in terms of operating expenses for that capacity.

Can it really be so simple? Yes and no. No, because in most storage architectures, just adding more spindles will not simply double performance. Many other bottlenecks exist, such as cache and controller performance. In addition, with most storage architectures, transaction count is not simply the number of spindles times the average transactions per spindle. In most real-life cases, 20% of the spindles are fully utilized, while 80% are practically idle.

What about latency? Many storage performance measurements are focused on latency. Due to their rotational speed, 7200RPM disks have approximately double the latency of 15K RPM disks. But what are the end-user implications of latency? Let’s consider this realistically:

  • End-users couldn’t care less if their request is handled in 1 or 20 milliseconds.
  • The latency problem really lies in the application/OS queue. If an application is configured to allow only 32 unacknowledged requests, and the storage response time is 5 milliseconds, then throughput will be limited to 6400 transactions/ second (32 times 1/0.005). Given a queue limit, storage with half the response time doubles the performance.
  • Write transactions are not a problem, since write transactions are written to the cache and acknowledged immediately.

So when is there a problem? When applications have an intrinsic limit on the queue size of read requests. Though in some cases, to ensure consistency, there is a limit on the queue size for writes (database transaction logs are a typical case), in most other cases, reconfiguring the queue size is not a problem.

Here are our final conclusions:

  1. 7200RPM drives are much more cost efficient, energy saving, and dense.
  2. There is no real problem in satisfying overall business needs with 7200RPM drives, assuming the proper architecture is in place.
  3. We need a storage architecture that will enable us to really exploit all drives equally, without manual tuning. We can’t afford having 80% of our transactions handled by only 20% of our disk drives.

Storage Industry: We Have a Problem

In our first post we’d like to make a controversial statement sure to anger many of our readers: the storage industry has failed to meet market needs over the last 10 years. Why do we say that? Because top companies are building their own storage solutions instead of using available technologies.

The most famous and interesting example is, of course, Google. Google has developed its own in-house data storage infrastructure, providing it with a global, scalable, easy–to-manage solution. As far as we can tell, no commercial product today offers comparable capabilities.

Another example is Amazon, which developed its own in-house storage system, later offering it as the S3 service. And there are more examples.

These companies did not develop new network hardware in-house; they used existing off-the-shelf technology from CISCO and other networking companies. They did not develop their own operating system; they used Linux, or some other operating system. They did not develop new motherboards or processors; they used the best available options on the market. They did not reinvent a new network layer; they used TCP/IP.

We can already hear your objections, all variations on the theme that “Google did not implement a generic storage solution — they implemented a unique solution that tightly integrates storage and application.” That is true, of course, since when one has to develop one’s own solution, it is tuned to one’s own requirements. If Google had been forced to develop its own operating system, we are sure that it would not have been a generic, all-purpose one, but designed for its own needs. But they found that the current offerings in the operating system world were “good enough” for them, and they used the generic technology. The same applied to network hardware, network protocols and other technologies; current market offerings were “good enough.”

Are generic storage solutions adequate to the needs of these giants? We think not. Take everything we know about these solutions, and ask storage administrators who manage one petabyte or more (as well as most of the storage administrators who manage 50TB or more), and we are sure they would be delighted to pay astronomical prices for an appropriate solution, but no salesperson has ever knocked on their door and offered one.

So networking technology has been catching up to customer needs, operating system technology provides adequate solutions, server virtualization technologies are providing amazing benefits, but storage lags behind.

Where did storage fail? Here are some guesses:

  • Scalability and global name space: there is no practical solution for a global name space in the SAN world, spanning multiple systems. When a volume is migrated from one system to another, hosts must be configured, switch zones must be set, and application downtime is required. NAS is getting a bit better with time, but still nowhere as good as networking technology.
  • We still have to assign logical entities to physical entities, that is, volumes to controllers/spindles. The server virtualization companies have shown us how space, power, manageability, and utilization all improve when one virtualizes the logical server over a bunch of physical resources.
  • Management standards have not matured yet, and there is no standard interface that allows easy management of heterogeneous environments.
  • Host software is still proprietary and lagging behind. Microsoft’s blessed VSS and MPIO frameworks are a step in the right direction, and we really miss a VSS-like solution for Unix environments.
  • Disaster Recovery (DR) methodologies are still project/solution oriented and not a packaged product. Although everyone needs one of a limited number of DR solutions, everyone has to implement an in-house integration project.

So what’s next? We don’t know. All these issues require new technologies or new standards, or a new industry-wide standard. We don’t know who will implement them, but we’re sure the market really needs them.

Welcome to ThinkStorage

Welcome to the launch of ThinkStorage, the XIV blog.

As you see in the About section, the focus of this blog is the storage community: administrators, IT professionals, and technology developers. We aim to keep the focus on storage technology. We warmly invite others to join the dialog.

The first entry is by Moshe Yanai, a familiar name in storage. It starts with a mea culpa for us all: why are storage customers building their own solutions? Please share with us what you think.