| IT Archive |
| Facilities Archive |
| Design Archive |
| Press Release Archive |
| Glossary |
| White Papers |
| What is a Data Center |
| Calculators |
| Standards |
| Poll Results |
| DataCenter.TV |
| Newsletter Subscribe |
| Analysts |
| Conference Providers |
| Trade Organizations |
| Choosing the Best Data Deduplication Solution in Hard Economic Times |
|
|
| Written by Fadi Albatal | |
| Thursday, 04 December 2008 | |
|
Late in the third quarter of this year, the investment banking system in the United States experienced a financial crisis, causing a ripple effect beyond North America and financial markets to Europe and Asia. Companies in all industries are experiencing lower revenues and are deploying strict expense controls.
Every IT department in the world is feeling the pressure. The mandate now and for the foreseeable future is to reduce capital expenditures, lower operating costs and save energy. This is not just about being green anymore; it is about fiscal common sense in a slow economy. Now is a time for IT professionals to think out of the box and investigate technologies that can effect greater efficiency and return on investment. This is nothing new to IT, but it is now a matter of survival. What may have simply been a good idea before is now a mandate, which is why the adoption of deduplication technology has accelerated towards the end of this year. Deduplication has become recognized as the next evolutionary step in backup technology. The benefits are tangible and extremely practical: eliminating duplicate data in secondary storage archives can slash media costs, streamline management tasks and minimize the bandwidth required to replicate data. In short, deduplication improves efficiency and saves money – just what is required when IT budgets are tight while mission critical data continues its exponential growth. There are many providers of deduplication solutions today, so how does one deploy the right one? Each vendor lays claim to having the best approach to data deduplication, leaving customers to face the difficulty of separating hype from reality and determining which factors are really important to their business. With some vendors setting unrealistic expectations by predicting huge reductions in data volume, some customers may find themselves ultimately disappointed with their solution. Companies must consider a number of key factors in order to select a data deduplication solution that actually delivers cost-effective, high-performance and scalable long-term data storage. This article will provide the background information required to make an informed data deduplication purchasing decision. Data deduplication is now more than ever an operational requirement
Because secondary storage volumes are growing exponentially, companies need a way to dramatically reduce these data volumes. Regulatory requirements magnify the challenge, forcing businesses to change the way they look at data protection. By eliminating duplicate data and ensuring that data archives are as compact as possible, companies can keep more data on line longer – at significantly lower costs. As a result, data deduplication is now a required technology for any company wanting to optimize the performance, efficiency and cost-effectiveness of its data storage environment. Although compression technology can deliver an average 2:1 data volume reduction, this is only a fraction of what is required to deal with the data deluge most companies now face. Only data deduplication technology can meet the requirements companies have for far greater reductions in data volumes. Data deduplication also can minimize the bandwidth needed to transfer backup data to offsite archives. With the hazards of physically transporting tapes being well-established (damage, theft, loss, etc.), electronic transfer is fast becoming the offsite storage modality of choice for companies concerned about minimizing risks and protecting essential resources. Eight criteria for a robust data deduplication solution
1. Focus on the largest problem
1. Focus on the largest problem
The following graphic, courtesy of the Enterprise Strategy Group (ESG), illustrates why a new technology evolution in backup is necessary. Incremental and differential backups were introduced to decrease the amount of data required compared to a full backup, as depicted in Figure 1. However, even within incremental backups, there is significant duplication of data when protection is based on file-level changes. When considered across multiple servers at multiple sites, the opportunity for storage reduction by implementing a data deduplication solution becomes huge.
[Figure 1] 2. Integration with current environment
Solutions requiring proprietary appliances tend to be less cost-effective than those providing more openness and deployment flexibility. An ideal solution is one that is available as both software and turnkey appliances in order to provide the maximum opportunity to utilize existing resources. 3. Virtual tape library capability
4. Impact of deduplication on backup performance
By comparison, data deduplication solutions that run after backup jobs complete, or concurrently with backup processes, avoid this problem and have no adverse impact on backup performance. This post-processing method processes the backup data by reading it from the backup repository after backups have been cached to disk, which ensures that backups are not throttled by any storage limitations. An enterprise-class solution that offers this level of flexibility is ideal for organizations looking for a choice of deduplication methods. For maximum manageability, the solution should allow for granular (tape- or group-level) policy-based deduplication based on a variety of factors: resource utilization, production schedules, time since creation and so on. In this way, storage efficiencies can be achieved while optimizing the use of system resources. 5. Scalability
A deduplication solution should provide an architecture that allows economic “right-sizing” for both the initial implementation and the long-term growth of the system. For example, a clustering approach allows organizations to scale to meet growing capacity requirements – even for environments with many petabytes of data – without compromising deduplication efficiency or system performance. Clustering enables VTL to be managed and used logically as a single data repository, supporting even the largest of tape libraries. Clustering also inherently provides a high-availability environment, protecting the backup repository interface (VTL or file interface) and deduplication nodes by offering failover support.
[Figure 2] 6. Distributed topology support
For example, a company with a corporate headquarters, three regional offices and a secure disaster recovery (DR) facility should be able to implement deduplication in the regional offices to facilitate efficient local storage and replication to the central site. The solution should only require minimal bandwidth for the central site to determine whether the remote data is contained in the central repository. Only unique data across all sites should be replicated to the central site and subsequently to the DR site, to avoid excessive bandwidth requirements. 7. Highly available deduplication repository
8. Efficiency and effectiveness
If the “chunking” begins at the beginning of a tape (or data stream in other implementations), the deduplication process can be fooled by the metadata created by the backup software, even if the file is unchanged. However, if the solution can segregate the metadata and look for duplication in chunks within actual data files, the duplication detection will be much higher. Some solutions even adjust chunk size based on information gleaned from the data formats. The combination of these techniques can lead to a 30 to 40 percent increase in the amount of duplicate data detected. This can have a major impact on the cost-effectiveness of the solution. Focus on the total solution
Although the benefits of data deduplication are dramatic, organizations should not be seduced by the hype sometimes attributed to the technology. No matter the approach, the amount of data deduplication that can occur is driven by the nature of the data and the policies used to protect it. In order to achieve the maximum benefit of deduplication, organizations should choose data deduplication solutions based on a comprehensive set of quantitative and qualitative factors rather than relying solely on statistics such as theoretical data reduction ratios. About the Author: Fadi Albatal is the director of marketing, FalconStor Software
Set as favorite
Bookmark
Hits: 1003 Comments
(0)
You must be logged in to a comment. Please register if you do not have an account yet.
|
| Thu, Jan 21st, 2010, @5:30pm |
| Fri, Jan 29th, 2010, @8:00am |
| Thu, Feb 4th, 2010, @8:00am |
| Tue, Feb 23rd, 2010, @8:00am |
| Tue, Feb 23rd, 2010, @8:00am |
| Sun, Mar 7th, 2010, @8:00am |
| Thu, Mar 11th, 2010, @8:00am |
| Tue, Mar 23rd, 2010, @8:00am |