Wednesday, August 12, 2009

The ins and outs of Windows-based deduplication

All those copies of data increase hardware, power, and IT costs. Data deduplication is the answer, and Windows shops may already have the core technology in place

"Storage is cheap!" you might say. Well, certainly disk storage has become cheaper over time, and there are all sorts of products on the market taking advantage of the price reduction with virtual tape libraries and megastorage appliances. But keep in mind the fact that more disk storage comes at a cost beyond the vendor price tag: power to support and cool, space to house, IT administrators to oversee. You have to factor in all these costs. Products that provide single-instance storage (SIS) and data deduplication can really help mitigate the expense of so-called cheap storage.


You may not realize that you have some of the necessary ingredients to take advantage of deduplication within your Microsoft server environment. But you do.

[ Get the latest on storage developments with InfoWorld's Technology: Storage newsletter. | Learn which vendors support Windows Server 2008's deduplication capabilities. ]

First, let's be clear on what deduplication is: Data deduplication is the process of eliminating data redundancies at the storage repository or from network traffic. You can deduplicate either at the object (file) level, which is also called "single instancing," or at the block (subfile) level, which saves much more space.

Data is naturally duplicated due to mass distribution or data processing needs. Most IT organizations maintain multiple copies of the same file in different repositories, or even a few iterations of files you are working on. In addition, backup applications produce and maintain multiple copies of files so that they are available for recovery. Backup processes have contributed greatly to the explosion of data proliferation in the datacenter.

Consider a simple scenario. An e-mail is sent out with a 10MB video to 100 people. If the e-mail platform doesn't have SIS capabilities and the backup product doesn't have a deduplication feature, you are looking at backing up 1GB of data (which takes space, time, and money) as opposed to a single instance of 10MB.

J.PETER BRUZZESE | infoworld

0 comments: