Throttling Up Performance: More Drives or Faster Drives?

“Disk drives are the key component driving any storage system — a system can perform only as fast as its disk drives.”

While true, this kind of thinking can lead to the wrong conclusion… especially when the issue is one of the industry’s most debated: 15K RPM vs. 7200 RPM drives (also known as FC and SATA drives, respectively).

Engineering and IT storage architects tend to have a common line of reasoning on this topic, namely:

1. 15K RPM drives give more performance than 7200RPM drives

2. Some data needs high performance, some doesn’t

3. Let’s find a way to put data requiring high performance on 15K RPM drives and the rest on 7200RPM drives

About now, you may feel like countering with: “You’re missing the point. SATA drive reliability is the real issue.” True, reliability needs to be addressed; we’ll cover it in a future post. Today, let’s focus on drive performance.

The traditional thinking about disk drives and their speed limits has led the storage industry to create a myriad of solutions, paradigms, and products. Underlying them all is the truth that prioritizing applications and their data is a daunting task. For instance, to classify today’s email as critical high performance data? Certainly. But what about last month’s email? Archival storage is enough. So what do we wind up doing? Adding complexity to the email application by differentiating its storage just to accommodate the performance issue?

Here are a few paradigms that storage teams apply trying to do just that:

  • Maintain different storage system tiers, with different tiers for different applications
  • Maintain different disk types within the same storage systems, assigning different volumes to different tiers
  • Automatically migrate storage blocks between different disk types within the same system
  • Migrate data at the application level from volume to volume, based on its importance

But these storage acrobatics beg the question: Is there really an issue about getting high performance out of 7200RPM drives?

Let’s look at the pure technological performance differences between 15K RPM drives and 7200RPM drives. There are two:

  1. 7200RPM drives can perform about 100 transactions per second, while 15K RPM drives can perform about 200 transactions per second.
  2. Latency of a 15K RPM drive is about half the latency of a 7200RPM drive.

Now let’s consider how this translates into the real life needs of a storage system — not an individual disk — starting with transactions per second. Take a specific storage system; say, one with 100 disk drives. A 15K RPM-based system can obviously perform twice the number of transactions as a 7200RPM system. But what if we use 200 7200RPM disks instead? The answer: We get the same transaction count and about six times the capacity. This, because any 7200RPM drive has about three times more capacity than a 15K RPM drive.

Is this a good approach?

  • Capital expense-wise: 7200RPM drives are about one-third the price of a 15K RPM drive. So these two systems have roughly the same transaction count for the same cost. (This, of course, relates to the storage vendor’s cost; a storage vendor’s price might be a completely different story. But let’s assume the margin is fixed.)
  • Capacity-wise: We get six times more capacity.
  • Operations expense-wise: We need about double the power and floor space but get a whopping six times more capacity, so are three times more efficient.

So, at a third of the capital expenditure, we get the same transaction count, six times more capacity, and are three times more efficient in terms of operating expenses for that capacity.

Can it really be so simple? Yes and no. No, because in most storage architectures, just adding more spindles will not simply double performance. Many other bottlenecks exist, such as cache and controller performance. In addition, with most storage architectures, transaction count is not simply the number of spindles times the average transactions per spindle. In most real-life cases, 20% of the spindles are fully utilized, while 80% are practically idle.

What about latency? Many storage performance measurements are focused on latency. Due to their rotational speed, 7200RPM disks have approximately double the latency of 15K RPM disks. But what are the end-user implications of latency? Let’s consider this realistically:

  • End-users couldn’t care less if their request is handled in 1 or 20 milliseconds.
  • The latency problem really lies in the application/OS queue. If an application is configured to allow only 32 unacknowledged requests, and the storage response time is 5 milliseconds, then throughput will be limited to 6400 transactions/ second (32 times 1/0.005). Given a queue limit, storage with half the response time doubles the performance.
  • Write transactions are not a problem, since write transactions are written to the cache and acknowledged immediately.

So when is there a problem? When applications have an intrinsic limit on the queue size of read requests. Though in some cases, to ensure consistency, there is a limit on the queue size for writes (database transaction logs are a typical case), in most other cases, reconfiguring the queue size is not a problem.

Here are our final conclusions:

  1. 7200RPM drives are much more cost efficient, energy saving, and dense.
  2. There is no real problem in satisfying overall business needs with 7200RPM drives, assuming the proper architecture is in place.
  3. We need a storage architecture that will enable us to really exploit all drives equally, without manual tuning. We can’t afford having 80% of our transactions handled by only 20% of our disk drives.

45 comments to this post:

  • You seem to do a little hocus-pocus over the notion of IO request queues. While I concur that with deep queues you can get IO requests widely distributed, But the fact is an Oracle database random read cache miss request is going to be limited to the speed of the disk, no matter how many spindles you have spread your data across or how deep the queues.

    And I had to laugh at your assertion that “end-users couldn’t care less if their request is handled in 1 or 20 milliseconds.” While perhaps true for end-users running PowerPoint or Excel, the end-user waiting for his credit card transaction to be approved at Macy’s might - especially if her request was queued behind a million or so other transactions.

    Not to mention the end-users waiting for an SAP report. Or the broker trying to adjust a stock portfolio last week. Or the guys responsible for real-time routing of packages and planes over at FedEx. Or the folks responsible for logging phone call transactions at any one of the telecoms around the world…

    I’m pretty sure that they all really do care about milliseconds - which is why they buy 15K drives and flash EFDs for their applications.

    But then I also think people are starting to realize that XIV isn’t really targeting these types of performance-sensitive mission-critical applications anyway.

  • To “the storage anarchist”:

    I think you may have misinterpreted the meaning of “end-users couldn’t care less if their request is handled in 1 or 20 milliseconds.”.

    I say this because you said: “especially if her request was queued behind a million or so other transactions.”

    If millions of requests are on the queue, they can all have identical, relatively large latencies, yet all finish at the exact same time.
    In other words, latency of separate requests does not accumulate. Or did you mean “queued behind” outside of the storage system’s queue? In that case, deep queuing at the storage level prevents latency accumulation.

    10 extra milliseconds of latency of the storage system is unlikely to have an actual effect on the vast majority of applications, virtually all of those that communicate with humans at the other end.

    You also said: “But the fact is an Oracle database random read cache miss request is going to be limited to the speed of the disk, no matter how many spindles you have spread your data across or how deep the queues”

    If the cache miss is spread to multiple reads from multiple disks, your latency is the maximal of all these disks, but the actual I/O read speed of the disks accumulates.

    By spreading data from one fast disk to multiple slower ones, you get better bandwidth, more capacity and a lower price. The only disadvantage seems to be doubled latency, which I believe has a significantly smaller impact.

    I disagree with the examples you brought of latency-critical applications. A human cannot feel ~10 milliseconds of latency, and it seems all of your examples involve a human at the other end of storage requests.

    Real latency-sensitive applications would be ones that don’t interact with us slow humans, but with something faster. Perhaps automatic stock trading would fit this bill, but not a broker adjusting a stock portfolio.

  • I’m not an engineer and don’t keep up on drive performance characteristics, but I recall issues with SATA drives not having the same consistent response times as FC and SCSI. This is not a mechanical latency issue, but involves drive electronics taking time out for internal housekeeping. The times involved were measured in seconds, not milliseconds.

    A “sleeping drive” is not necessarily a problem with reads because read redundancy can compensate, but writes are another matter and write caching probably isn’t a solution.

    Drive response times notwithstanding, there is an argument for applying wide striping with 15K drives too. If you want top tier performance as part or your subsystem, why not have it available? Less critical applications can use lower tiers with slower devices.

    My question is why would you want 2 separate systems for this if you could get it in a single system and save floorspace, power and management time?

  • Omri, first of all welcome to the blogosphere. I do find it amusing that my first post was about SATA in the enterprise too… but someone-what more aimed at EMC’s claims that the SATA as a tier in a monolith was something that was a good thing.

    In general I see your point, and SVC has been proving that striping across multiple arrays/spindles is a good thing for 5 years. I therefore agree that striping can provide substantial performance gains. However, I’d argue that striping over 15K RPM vs striping over 7.2K RPM can still provide different tiers of storage. If underlying latency is better at 15K, then striped latency can be improved at 15K or 7.2K.

    I guess my point is that while XIV does bring SATA to an enterprise latency level, there are still benefits (should you decide to pay for them) from striping at FC or SAS level. While SVC stripes at larger ‘chunks’ than XIV, the said same benefits can be gained.

  • Eyal - very interesting perspective you bring to the discussion.
    -
    Perhaps you are correct - if the customer’s transaction indeed requires only a single independent I/O, and all of the other millions of I/Os queued are also independent of each other, AND you have enough disk drives to deliver every single I/O in a fraction of a second.
    -
    But alas, in practical applications there are very few independent I/Os - virtually evey transaction requires multiple synchronous and dependent I/Os - I/Os that cannot be requested by the application until the prior I/O has been completed and returned to the host. Even a simple bank deposit requires something like 12 8K I/O’s, with at least 8 of them synchronous cache-miss reads.
    -
    As IBM Master Scientist Barry Whyte ovserves, this is why many applications in the real world today use wide-striping across 15K rpm FC drives today to create a “performance tier.”
    -
    It really is about response time, and I’m sure that you and Moshe understand that response time is critical to many applications in every IT shop in the world - and the more mission-critical the application, the more significant the impact of latency is…
    -
    This is also why solid-state storage is getting so much interest today - it’s all about response time, and not maximum IOPS.
    -
    And even if your math works, if it takes 30 15K FC drives to replace the IOPS of 1 flash drive, it would require at least 60 SATA drives in an XIV just to keep up with the IOPS. But the BEST cache miss latency you could get from the SATA drives would still be the 20 milliseconds you used above…and probably far worse, given any spindle contention. Given the high frequency of dependent I/Os in real-world applications, that could translate into a four hour batch job on Enterprise Flash Drives taking more than THREE DAYS to complete on XIV SATA.
    -
    There’s no argument that there are plenty of applications and workloads where Flash or 15K response times aren’t required, but you’re being a bit disengenuous trying to insist that you’ve overcome the laws of physics and the inherent latencies of slowly spinning media.
    -
    Oh, and the XIV reference account Leumi actually confirms this, by explaining that when they run out of performance on one of their XIV arrays they stop putting new applications on it. Thus it is true that you the XIV cannot deliver consistent performance independent of how much data is stored on the array…making customers purchase even MORE capacity that they cannot use and reducing capacity utilization well below any cost advantage of SATA over FC or SAS drives on both a $/IOP and a $/GB basis.

  • First, I’d like to respond to the rampant XIV bashing here- do all the latency math you want, but the people I’ve talked to who tried this thing say that for their performance tier apps, XIV is screaming fast.

    Second, Omri, I take issue with the following statement: “Operations expense-wise: We need about double the power and floor space but get a whopping six times more capacity, so are three times more efficient.”

    I’ve never met a storage admin who measures operational efficiency by capacity. A performance ratio is much more sensible since usually the storage they buy is sized based on the horsepower they need.

    That said, I think that the money saved by using 7.2k RPM spindles probably outweighs the extra floor space and electric costs compared to traditional 15k drives.

  • Oh yeah, one other thing: your comment engine mulches my paragraphs. Could you set it so that if I put two line breaks, it actually breaks both times?

    Thanks :)

  • ossg -
    -
    Sorry, but I don’t see any XIV bashing here.
    -
    Polite challenges to the assertion that a large number of 7200 RPM SATA drives can deliver the same response time as striping over faster 10K or 15K rpm disk drives or flash, yes.
    -
    But that’s hardly “rampant bashing.” Chill out!

  • Anarchist, could you share your line feed secrets with the rest of us?

    Maybe there are no secrets?

  • To “the storage anarchist”:

    I see your point.

    Extra latency may hurt you by stretching the length of chains of dependent I/O’s. I think we have both agreed that independent I/O’s (and also short dependent I/O chains) are insignificantly slowed.

    The question is how many “long I/O chains” (IOW, how much I/O dependencies) there actually are, and how much their slowing down is significant to the using applications.

    Your example is a 4-hour flash batch application. According to the huge slowdown you mention, it would mean that these 4 hours are virtually one big dependent I/O chain. I fail to see any application requiring this long a chain. Do you have any examples in mind? What kind of long-chain I/O dependencies exist in the real world?

  • @the storage anarchist
    I didn’t mean rampant unpolite bashing- I’d certainly agree to calling it “Polite challenges” :)

  • All, we’ll try to resolve the double line feed issue, but until then, we’ve added extra paragraph spacing. HTH.

  • I think that the performance analysis presented in the comment above is not accurate. Let’s first review the scenario mentioned above:

    1. A customer is using his/her credit card at Macy’s

    2. A server accepts the customer’s
    transaction (as well as a million others) and registers them using storage transactions

    We all agree that the customer at Macy’s couldn’t care less about 5, 10 or 20 millisecond latency but, it was argued, that the server processing a million such requests would care (together with servers running SAP, call logging and other applications). It sounds convincing but, in most cases, is wrong. Latency would impair the server’s performance if it has a long thread of dependent I/O – that is, I/O transactions that need to be executed in a certain order (database log transaction volume is the typical example). These servers all have such threads, but these are threads of write transactions, not read transactions. While for read transaction the latency of the storage system depends on the disk latency, for write transaction the latency is the cache latency.

    The example of the bank account transaction having 8 reads has the same characteristics: maybe these 8 dependent reads would take 24 more milliseconds, but a 24 millisecond change in response time is meaningless. The real question is: Can these database transactions be performed in parallel?

    Flash drives are a completely different story, and present performance that is greater on an order of magnitude — and at a price that is higher by an order of magnitude. This is still a niche market and not a solution that will replace disk drives.

  • I believe that the “sleeping drive” phenomenon mentioned is related to the days when SATA drives targeted only the consumer market. Now, when we have enterprise-level disk drives, I think that disk drive manufacturers can provide higher quality to 7200 RPM drives. In any case, as you note, read latency is not affected due to redundancy (at least, when using mirroring; RAID-5 and RAID-6 are a different story), and write latency depends on the cache, not spindle latency.

    Is there a case for 15K RPM drives? There probably is, but what I am saying here is that this case is much smaller than people think. I believe that most applications can be easily tuned to provide the required performance using massive striping on 7200 RPM drives, and that only a very small number of applications really need the low read latency of 15K RPM drives.

  • Marc - I inserted lines with a just hyphen in them to create the breaks…you can see them now.

    Eyal - batch jobs are by nature very sequential processes that rarely have any parallelism. Report generation, data consolidations, inventory management, etc. - the logic is usually “process Item A, then Item B, then Item NNN…”. Each step in the “process” is itself generally a string of dependent I/Os…thus the example I gave is very real. I know of a customer who replaced only 10% of their capacity with flash drives (rest on 15K) and reduced their batch time from 12 hours to just under 5. The 7 hours savings daily more than pays for the flash drives, freeing up IT resources for other operations and workloads.

    Omri - I think we’re just going to have to agree to disagree…it’s hard to debate with you when your math doesn’t add up - 8 dependent reads at 20ms each isn’t 24ms - it’s 160ms…

    whether long threads of dependent writes or millions of short threads of dependent reads, there is always going to be queueing involved - there are practical limits to parallelism in every application. And even if all the requests could actually be posted to storage simultaneously, there would STILL be queueing, because as you’ve noted, each SATA drive can handle only about 100 IOPs - for a million simultaneous 4K block IO transactions, you’d need 10,000 SATA drives to deliver all the IOS in 20ms of elapsed time. But if you only have 1,000 SATA drives, it will take far longer than 200ms - head contention will actually reduce the total IOPS you can get from each drive. And with only 100 drives, you’re talking multiple seconds to service the million single-block IO requests.

    And since we’ve concurred that there really isn’t such a thing as a single block independent I/O for these transaction types, if each requires 12 4K random cacce miss I/Os, the person at Macy’s is waiting minutes to complete her purchase.

    And that’s if the servers could actually post 1 million simultaneous transactions per second to the database…(OK - we can argue this might really require multiple separate databases and servers, but let’s stick with the simple example).

    And by the way - write latencies do become a factor when the write pendings exceed avaialble cache…whether read or write, cache misses will cause heads to move and add even more latency - getting 100 IOPS from a SATA drive requires optimized seek ordering…make the IO stream totally random, and IOPS can drop down to 20 (or worse)…and response times will porportionally increase.

    But like I said, you’ll likely not agree with this perspective, and that’s OK.

  • To the storage anarchist:

    “the logic is usually “process Item A, then Item B, then Item NNN…”. Each step in the “process” is itself generally a string of dependent I/Os”

    If this is the case, then the real solution is for either the application, or the OS underneath it, to do more aggressive pre-fetching.
    It knows what items its going to be processing, it can ask to fetch them prematurely, and negate any extra latency.

    If we are dealing with broken software that creates artificial and unnecessary dependencies between reads, then one solution is to pay for much more expensive hardware for less capacity. However, I believe the right solution is to use proper software that uses independent reads for independent operations.

  • Simple matter of programming?

    FWIW, Rip Van Winkle (yours truly) woke up and read the drive literature to find the problem of on-drive error correction was solved about 4 years ago.

  • Eyal -

    As Marc observes, it’s always a SMOP.

    Unfortunately, I think the last of the COBOL programmers are living comfortably in the tropics, after the killing they made fixing old code for Y2K.

    Seriously…if the application or operating system could prefetch everything, then we wouldn’t need ANY caching or striping in our arrays…fact is that all too often data working sets far exceed the limits of practical prefetching and write buffering…and thus, response time really matters.

  • Here is my conclusion:

    If you run software that doesn’t unnecessarily create very long dependency chains between reads: The cheaper disks’ solution is better in all accounts: Bandwidth, price, capacity and power use (the cheaper disks are also roughly half as power hungry).

    If you run old, poorly-designed software, and you cannot fix its bugs: then you will have to slow your batch processes, and possibly suffer through doubled latencies, in order to gain the cheaper storage of more capacity and bandwidth.

    Note that even in this case, these doubled latencies can at worst double batch/request times - but due to doubled or tripled bandwidths, the effect isn’t likely to go that far. So its likely to be somewhere between x1 to x2 latency/batch times.

    I don’t think this points to a major flaw in the highlighted strategy. I think it all points towards the replacing of poorly designed software, so you can enjoy the benefits on all counts. Not to mention, that being locked into a static piece of software that you cannot in any case modify to suit new needs is probably a bad idea.

    Can you agree with these conclusions?

  • I disagree that it is just a case of poor software. If my problem is inherently parallel (say for example image processing), I will break up the problem and prefetch the data to the servers separately - I don’t care about storage performance at all.
    -
    As soon as I have a case of dependent data that is shared between the independent processing units, then the latency of the shared data access becomes paramount.
    -
    i.e. TIMEtotal = % PARALLEL * (LATENCY/number units (drives in this case)) + % SERIAL * LATENCY + CONTENTION_OVERHEAD * number of units
    -
    The strategy works provided the CONTENTION_OVERHEAD of using the multiple drives is kept small enough and the percentage that can be parallel is large enough. This is true in many cases, but not all, but blaming the programmer will not fix the problem if that is what makes the array slow.

  • To “Tech Enki”:

    Can you please provide some example cases which would form very long dependent read chains?

    I am having trouble seeing how those could form in practice.

  • Interesting thread thus far. I have a different thread that is less quantifiable in nature. Early literature from XIV/IBM thus far describes this as a combined tier 1/2 storage solution. Under the assumption that the data is internally managed such that large deposits of infrequently referenced data have no negative impact on the generally smaller category of highly referenced and updated data, I would buy the assertion of tier 1/2. I believe that simplifying storage management is arguably in the top 2 priorities of storage in general. I am seeing a dramatically shrinking workforce that is still capable of solving the complex IT issues. Interestingly, while the skills of people I can hire have gone down, the complexity of our environments has gone out of sight. Obviously not a good trend, and there is nothing laudable about requiring gurus in one’s IT architecture. Beyond tier 1/2 is the question of full automation vs. ILM, I would opine that business requirements are capable of requiring processes that cannot be inferred through the operational behaviour of the physical system. If one wanted to have a ‘big picture’ storage architecture from the XIV perspective, what would it add on (theoretically of course) to this one piece of admitedly impressive technology? My org has been in the virtual server environment for well over 4 years now, and it was abundently clear in the very first year that we only had half of the equation. I’ve been interested in and following the, for lack of a better term, small innovator storage companies. Most of us know their names, so I’ll avoid blatant advertisement. A small number of them, XIV included, caught my attention as having solved part of the issues. Interestingly, they had created largely non-overlapping best of breed solutions. I have refrained from moving forward with the purchase of any of them, since this time I want it all, and it seems we have enough technology for that to be a reasonable request. Can you now suggest the proposition for a ’single pane of glass’ storage management solution that is also by definition possessed of the various characteristics we see in these multiple virtual storage solutions? By example, thin provisioning is a wonderful feature, but it is only one of many features in this class of storage. Therefore, is there a visionary roadmap beyond what we see today? IBM occasionally calls it investment protection and this resonates with me when I take it in the broadest possible context, rather than simply the writedown period for the box. In summary, I am personally tired of the incremental feature treadmill, and would like to see a real vision that doesn’t get retired to the recycler every 5 years or so. Call it a variant of sustainability. Thank you.

  • It’s hard to read the light grey comments on white background. nice post though :)

  • As one of those mythical beasts, an actual storage customer; it would be fantastic if we could get our development teams to rewrite their code to run more efficiently; unfortunately it is nearly always quicker for us to fix code problems by throwing infrastructure at it rather than refactoring code.

    Our application developers are focussed on adding functionality and delivering on promises made to the business. If the code works, it ships and it is only on odd occasions where it gets properly revisited and retuned. Batch is a great example where changes in underlying perfomance can have really dramatic impacts; if my storage cannot process its requests quickly enough, I’ll run out of the batch window and then I’ll get shot. I can point fingers at shoddy code (I do from time to time) but it’s still the storage’s fault.

    One tier fits all is a nice aspiration but I think it’s unrealistic at present.

    And depreciation/maintenance cycles are my largest driver for my refresh treadmill; it is very unusual for a feature to come along which I must have and means I have to refresh. Visionary roadmaps do exist tho’; I don’t actually see that much difference between the visions apart from mere implementation details.

  • I don’t think the “whopping six times more capacity” is as valuable to the situation as you propose.

    You doubled the 7200 RPM spindle count to match the IOPS of the 15,000 RPM drives. But now have also increased the likely IOPS demand by 6 times.

    For example, now that I’ve got 6 times as much capacity I won’t just be storing 6 times as much email for my existing users, but instead I’ll be putting 6 times a many users on the storage array and trying to generate 6 times as many IOPS.

    The IOPS per GB of usable capacity issue is too often overlooked.

    Six times the capacity is great, but you may only get to use a small portion of it before you run out of performance on those drives.

    Do you agree this is a valid concern?

    -E

  • Almost all the commenters seem to be missing the basic point that it’s most often cheaper/faster/better to buy more slower drives than fewer faster ones. The price premium on the fast ones is huge, for the same money you can often get a lower RPM system that will not just equal, but outperform the higher RPM one.

    Forget arguing about the latency of a single, or even multiple related I/Os, because USUALLY the important thing is throughput of many concurrent transactions. The added GB capacity of the cheaper drives is just a bonus.

    “E” brings up an interesting point. Just because you have more GB capacity doesn’t mean you can fill it with big demand data, if you put six times the users instead of storing more old data, you are mismanaging your storage.

  • More to the point, people given 6x the space they currently use will use it to make copies. Something like 10% of any given dataset is accessed with any regularity.

    This has been common across all SATA solutions- the difference here is that assuming that 90% of the data you dump into the array will never be touched again, you still get the performance of the drives that data resides on.

  • I think it is time to add a new topic here, I believe it is well overdue.

    I also think that some of the things The Storage Anarchist is saying on his blog are not being negated in its fullest form. Does anyone agree?

    Don’t get me wrong Mr. Pearson does his fare share but shouldn’t this be the forum to “right the wrongs” of all the FUD being thrown around about XIV??? I’m not sure but I’m starting to believe the EMC hype…please someone set the record straight!

    Thanks

    -Bob

  • Lot of interesting posts here of SATA vs FC, striping here vs doing it another way. has anyone brought up the discussion of what does the market need and what is XIV addressing? of the 100% storage market, what % is the “true” tier one. The 200k or even the Million IOPS Database. I capitalize it, because it’s as rare as the honest politician (just to bust on them). Anyone want to discuss the true tier 1 %, vs everything else. I’ve been doing this for 11 years, and I’m going to throw it out there that the real tier one is 10%. You could make a case that it’s only 5% or you could really slant it and claim it’s 20%. I thin you would get a lot of arguments that it’s anywhere near that 20% mark though.

    So why not have a storage array that could perfectly handle 80% 90% of the business out there without any technical argument needed about IOPS? Is this really about “one platform fits all” or is it “lets discuss the minutia of the highest IOPS and shortest latency upper class - that discussion and argument is getting tiring after all these years. Yes, Flash may be the answer to everyone’s dreams, I boot my home computer off an SSD 64GB C:\ drive and its extremely fast. So what, I’m crazy.

    I would offer to say the storage lay of the land is probably similar to the ratios of people in the USA who make less than $300k vs those above that number. Or the ratio of people who own $80k + cars vs those of us who bought and drive the not-so-hot used models of 4 years ago and paid $35k for what was $68k then.

    Business case vs engineer for a minute if you will. (I am the latter, but think in both realms)

  • Referring to Jay’s question, that is more in the direction of where I was going in my first post. At the business needs level, I idealize storage that would allow me to satisfy a wide range of service profiles. I would like it to be my highest performing database target and at the same time be the tier 3 seldom referenced data storage medium, and all that is in between those two points. I would like it to provide a single storage management interface that allows for stating my needs in business rules terms rather than storage technology terms. I would also like a deeper management interface implemented in the traditional technology terms that could be there for the extenuating circumstances that always occur when trying to use a business semantic for a description of a technology process. My goal is to have an architecture for storage that is so simple, one staff member can easily handle the entire enterprise. I assume all the typical characteristics of non-stop operation while upgrading, repairing or expanding, both at the physical and logical levels. Note that I’m not saying this exists. It is what I idealize.
    In my early career I was a CS and am quite comfortable with the science behind this interesting profession of ours. That said, as IT management, I long ago adopted the view that my most efficient roadmap was to do appropriate but not exhaustive diligence in the initial analysis of anyone’s product, and then passing that, bring it in house for proof of concept use. It seems to go quite a bit faster that way than going the route of a deep theoretical analysis. Obviously, all imo. Some vendors are not willing to go there at no cost to me, and on those, I smile and pass.
    As a postscript, I will note that some of the smaller innovators in storage out there seem to be approaching this idealization I outlined. I’ve come close to purchasing non-traditional designs, but no cigar just yet.

  • Can someone explain the hype to me here. Because off the top of my head I can name at least three or four midrange solutions that, assuming I wanted to fully load with SATA will provide both better functionality and utilization than XIV with the added bonus of being proven depoloyed solutions. All provide wide striping, thin provisioning with options for mirroring and other raid levels and should I choose can be deployed with additional drives tiers and in a modular fashion. All due to inherrent virtualisation are also very easy to deploy and manage. So what’s the big deal here ? it looks nothing more than a loosely coupled whitebox cluster serving up some storage with UPS to protect writeback cache. Hardly groundbreaking.

  • If there is one thing I can say I have seen over my years in storage, and nearly that time reading blogs, forums, doing SAN design, fixing problems by being onsite for extended periods or “shotgun pro svcs”, selling solutions and services to storage customers, and finally moving closer to an engineering type role is:

    Too many people suffer from analysis paralysis.

    and this blog shows it.

    Argue all you want for or against something, we could have 700 monkeys typing for 700 years and be not one step closer to an answer. Call IBM, get an XIV on your floor along side any other storage systems, do the appropriate testing, or putting an application on it, run cost analysis, see which system will be easier to deploy and train the storage team on, and see if it does what you need it to do. Sitting around on our morning coffee breaks talking about FC vs SATA, 1 ms vs 30ms response times have next to nothing to do with the real world implementation of any storage product, anymore than discussions around network switch and routers technology arguing and challenging 1.2 microsecond vs 1.4 microsecond passthroughs of packets. Try to find a forum with any passion surrounding that, and the dates will be in the 90s. My viewpoint is that discussions like this blog will be looked back on as “oh yeah that’s what they talked about back then” when we’re in 2011 to 2019 and beyond, and the new generation comes in looking at some of us like we’re experts on the steam locomotive when they’re being autopiloted around in floating ground cars that use the earth’s magnetism to move them.

    Let an IT team buy a perfect storage solution, then mess it up with a poor RAID and layout implementation. And/or let the network team, the server team and the database team make a multitude of misconfigurations that will all bit by bit slow down the total path of the IO response time. And then — blame the storage.

    This shouldn’t be a religous debate, it’s about what works for customers. I say religious debate, because everytime I read religious discussions, it’s the same thing repeated over and over, in the hope that one will follow the scriptures by brute force of repetition. What we’re repeating over and over is IOPs, response times.

    Storage Anarchist, I have read your posts over the years, and you have good points and you are certainly knowledgeable, but you are extremely limited in what you discuss. I don’t know if you don’t care to move outside that realm, or don’t know much past that realm, it doesn’t matter, but in my opinion, you need a new story to tell. This is not meant to be an insult, so please don’t take it that way.

  • John, Can you be specific about what midrange solutions you are talking about?. I get the feeling that we live in parallel worlds and the one you mention doesn’t have such capabilities yet.

    I just fail to think of any vendor (other than XIV) who would provide customers a Tier1 SATA-only solution.

    I might be wrong but I sense that your point of view is from a vendor perspective and is missing the point that a SATA only solution is much more cost effective for customers. Why wouldn’t you deploy a SATA only solution if it brings all the benefits you listed at a much better cost?

  • Andy,
    take your pick, EMC-CX, HDS-AMS, HP-EVA, SUN-Fishworks, 3Par-Inserve, Netapp-FAS, IBM-NSeries, IBM-DS.

    I’d say 3Par, HP and SUN are the ones who could provide an equivalent technology out of the box based on their existing product which incorporate wide striping capabilities on SATA. Although I would question the use of tier 1 and SATA in the same sentence. In fact who designated XIV as tier 1 ? there’s no industry body for this, it’s just marketing and positioning.

    Customers need to be made aware of the potential shortfalls of SATA and as far as I can see, IBM haven’t really overcome these issues. Non of the technology I’m seeing in XIV is really new, it’s just a conglomerate of existing technologies in a rather big inflexible configuration. How does that suddenly solve all of these issues and become a Tier 1 storage array ?

  • Marc Farley here, from 3PAR. Customers can use almost any mix of SATA and FC drives (15K and 10K) in our systems. Some have gone with all SATA, others use all FC and the rest have a mix of drives. By default, the low-level volume manager in our systems uses wide striping for all volumes within a drive class, but it is possible to span drive classes if desired.

  • Can any of those vendors rebuild a full, 1TB hard drive in 30 minute Max.? That in itself should be a clue that things are different in the XIV.

  • JR not sure about actual rebuild times since all rebuilds are sensitive to front end loading. However many employ both proactive and virtual sparing as well as mirroring or double parity raid and, crucially all are field proven. The XIV as far as I can tell can only rebuild a 1TB disk in 30 minutes so long as the disk has only been partially written to. If someone decides to write many or even every block at any point during the disks lifespan then XIV must rebuild every block. It’s a bit like thin provisioning in a block environment, just because the O/S, Application deleted the file doesn’t mean the array knows the block is free.

  • I see there’s too much opinion about SATA drives like it happened many years ago when people though it would be ridiculous to run a business on a “thing” called personal computer.

    I also see the size of FC drives going bigger and bigger, not smaller. Now they are 300, 400GB, yet, with such rotational speeds there will be soon a limit. Also, with capacity increasing, so the rebuild times of RAID5/6 will also increase. Wondering how the industry will react to this fact while still demanding for tier1 solutions.

    What are from your perspective the shortcomings of SATA drives? (1/2 IOPS than a FC, 1/2 latency? Both FC and SATA are slow as hell when compared to memory, bus and cpu manipulations right? I’ve fixed plenty of performance problems and the real issue wasn’t the drives… but the quality of the layout and design that used the drives. And yes, when nothing else could fix the layout, I had to drop in some more FC drives, short-stroke them and spread the load.

    btw, as an example, a CX doesn’t provide true thin provisioning, you still need to configure and design for it… in addition to pay for it. Wide striping? limited to the LUNs in a RG, then you would have to use metaLUNs, which are also limited. Limited cpus (only 4), limited memory (from 4 to 16GB on each SP) and Rebuild times take 8hrs, if not days. Yes, it’s a modular array (only 2), one blows and chances are that the whole array is gone. I don’t see how a CX or similar arrays could be compared to XIV.

    So, from my perspective. I respect your opinions but I sense that you have not seen it yet. And I believe XIV has to earn the stripes to deliver what it promises, but if they do; then I believe they might have a strong case.

  • Marc says 3par can do it, I know EVA can do it and I’m pretty sure the SUN kit can do it as can a few other niche vendors. EMC and HDS midrange have more traditional architectures and don’t really do native wide striping so maybe they’re not the best option. So again really I’m still not seeing the value prop of XIV vs the established competition ?

  • Andy, I also respect your opinion and I’m not asking anyone to believe me or put blind faith in my comments or conclusions. I’m just providing a counter point to the hype that I perceive around XIV. I’m suggesting that people step outside the marketing presentation and ensure they don’t evaluate this box in isolation. One size most definitely never has, nor will fit all. Take it or leave it, but in my humble opinion, at this point in time the XIV is a solution looking for a problem. This is partly why there’s so much confusion over it’s positioning, including my own.

  • I’m still not sure I follow what’s different about this technology. Is it the same type of solution as the Compellent technology?

  • I'm not sleeping  Friday, 12/06/09 at 01:13

    I am a huge fan of the interface and ease of management of the XIV!

    A couple of questions remain however that still are leaving me apprehensive:

    1) With all of the processing and SATA drives in a single rack, the heat output must be tremendous. How many BTU does the full rack XIV put out?

    2) I understand that upon a drive failure, the drive is not necessarily “rebuilt” but “redistributed” among the box in 30 minutes. When replacing the failed drive, how long does it take to rebuild the physical new drive?

    3) IBM is great about posting performance numbers for their products on Storage Performance Council. When can we expect XIV numbers to be posted?

    I’m really looking forward to seeing the results of this along with the half rack model!

  • Solutions Architect  Monday, 6/07/09 at 23:54

    Some comments about the real world.

    I am seeing a lot of comments here from various competitors to XIV, leading the conversation to where they think it will benefit themselve, instead of talking about what is best for the customers.

    I have been a server, storage, and SAN admin/customer for most of my career. I have also architected solutions using a lot of your different vendor’s products, in the past. Let’s discuss some of the things customers really want, and what I asked for year after year. XIV seems to have addressed all of my wants when relation to storage. Now, if it could get my son to mow the lawn on time, it might be the perfect piece of equipment in the world.
    1. zero downtime - either planned or unplanned. I have spent many a night migrating data from one storage platforn to another. The least disruptive to my users was usually host based migrations, which happened to be the most disruptive to my sleep and personal life. Yes, there is an outage to move to XIV, if using their migration tool, but their future direction is by far the best I have seen. Yeah, almost every vendor says 5 nines availability, but that only covers unplanned outages. To me the planned outages were more troublesome, since getting downtime sometimes required an act of congress to get all my end users to agree to when an outage oculd happen.
    2. Reliability - From what I have seen and heard, this box is extremely reliable. The box is designed to account for component failures, and also readjusts itself to perform optimally. If I lost a drive or DAE on another platform, performance generally degraded for some time, or data was lost (yes, it may have been due to the way I laid out my raid groups, but with XIV you don’t have to worry about that).
    3. Performance - From talking to XIV customers, it seems that they are getting equal or better performance than their previous solutions. It is very hard to tell these customers that they don’t know what they are talking about, since they are actually running real world apps. Yes, an imaginary workload could overrun an XIV, but that is true for every storage frame. Oh yeah, I used to be a programmer too, and I could write code that could cripple any storage frame, or make any frame look good. This is why I am not a fan of benchmarks. I would rather see a real app at work. Every company has to weigh the cost of changing code, to buying extra HW/SW. XIV seems to be very forgiving to the odd bad code. If you can’t change your code and it doesn’t work on XIV, then sure buy something else. My bet is that you will spend a lot more money though.
    4. Ease of management - I have worked on a large variety of storage systems, and XIV is by far the easiest one I have ever worked on. Also, I don’t need to take months of training classes, to make sure I am setting it up correctly. It took me less than a day to be proficient. After many years of looking at heat graphs and spreadsheets, to make sure I laid out my data correctly, XIV would have saved my a tremendous amount of time. I am sure I would have a lot fewer grey hairs, had it come out a decade earlier!

    ILM is something I have had to do for years. This to me is only because XIV didn’t exist at the time. Spending hours/days/weeks to guess what tier to put my data on, is a ridiculous concept, if you don’t need to do it. Yes, it is a guess. Every company will try to convince you that it is a science, by looking at stats and talking to app teams and DBAs to gather information, but it really is a guessing game. Every person that comes in contact with data, has a different opinion on how important it is, and how frequently it is going to be accessed. Also, over time that data changes its access patterns, which makes your layout obsolete. Don’t get me wrong here, there is still a need to use ILM when talking about archiving, but at the Tier 1 through Tier 3, range, XIV has removed that need. Someone once told me that ILM means “I like money to the vendors, and I’m losing money to the customers”. This is sad but true, but there really was no other option but to play the ILM, until XIV came out. Solution Architects have made a lot of money from going into customer sites, to try to save them money by tiering. Trust me, I was one of them. I would rather spend my time on other tasks now!

    All the conversations about milliseconds, really dosn’t matter, if companies put their data on XIV and they are happy with the way it works! I am sure there is an app or two out there, where XIV doesn’t work, but they only account for a small part of the upper end storage market. With IBM investing in more R&D, I am sure that number is going to continue to dwindle. If you really have a storage frame that requires extremely high I/O, then talk to IBM about a trial. I am sure they will work with you, and if XIV doesn’t meet your needs, they have other platforms that might. More than likely, you will not need to look further.

    Sorry for the long winded comments, but I thought I needed to voice them.

  • I was able to test out the XIV systems vs. 3 other vendors.
    The interface was slick. Hands down, no one else is close.
    As far as single points of failure, true, there are none (I actually physcially unplugged a tray of disk, no hits).
    What I did find out though, is if you lose that tray (DAE or whatever), EVERY component becomes a single point of failure resulting in data loss.
    Rebuild times are VERY quick, no doubt, but that metric comes from a box that is less than 50% utilized. I won’t comment on what it is if your using 90% of your storage, ask IBM.
    Would really like to see some kinda of reporting or SMI compliance so it works with other tools in the enterprise.
    It is a solid Tier II box. I’ll let it get a track record before I recommend for Tier I apps.
    The environmentals. Now that thing generates some heat. It actually overheated a competitor product in the next rack over (kinda funny actually).

  • I agreed with John. EVA seems to have done the same jobs as XIV 5 years ago - if not 7 years - the first EVA was in market in 2002 and some functions such as SnapClone were still not implemented yet.

    Regarding the rebuilding time, I’ve encountered a case where an EVA with 80*73GB drives could be rebuilt in 30min, believe it or not. But actually it was just a trick - a rebuilding time only matters when you know how many data there is and what’s the workload.

Add comment
Send

Subscribe without commenting:

Get notified by e-mail whenever comments are added