If you happen to read the (interesting and entertaining) blog posts both NetApp and EMC produce, or follow @chuckhollis and @Vaughn_Stewart, you might already have noticed their discussion about data deduplication and the merits of each others storage arrays. If not, keep reading!
EMC and NetApp have been publicly trashing each other comparing the merits of each others products online for a while now; but yesterday Vaughn Stewart, Virtualization Evangelist at NetApp, got into what he calls a “tweet-fight” with EMC’s Chuck Hollis. The subject was this years hot topic, virtualization.
The main point being discussed is performance of their respective systems when using deduplicated storage; the uncontested fact is that when deduplicating storage, you’re saving disk space; but for many workloads, disk space is not a very important factor. The main factor determining your application’s speed will mostly be the number of IO’s per second (IOPS) a system can handle. When storing twice the amount of data on a single disk because you’ve deduplicated or compressed it, that disk will have to handle double the IOPS because there are twice as many requests for the data stored on it.
Both EMC and NetApp have their ways of dealing with this, but I’d like to point out that no matter how good your caching is, you’ll have to write the data to disk sooner or later. And if you’re dealing with “primary storage” for a large database system or file server, the access patterns will be rather random; making caching not completely useless, but at the very least less effective. Discussing certain edge cases, such as starting up 1000 virtual desktops running identical images at the same time, is fun but only relevant to a small subset of users.
Anyways, the comments on Chuck’s blog entry are refreshing. There are several comments from customers; most indicate that for their workloads, spindle count (the number of disks) is still more important than any storage saved by deduplication. I’m sure this will change once SSDs are becoming more common; but we’re not at that point yet. What I also like is the fact that these customers have all been doing actual tests before blindly trusting performance and ROI claims made by both vendors; taking the time to run some good, relevant tests will save you both time and money later! So bookmark this page at the EMC blog, and read it again when you’re considering buying a large storage system. It only takes a couple of minutes, and it’s full of important information and hints about the shortcomings of several of EMC’s and NetApp’s products.
Related posts:
