Data Center Scalability … it’s not just equipment that impacts long term reliability

In today’s global economy, with customers expecting on-demand availability of information, the data center has never been more vital to the ongoing success of an organization. Additionally with the increase in regulatory requirements, and other market drivers, we have seen cost of downtime for some data centers exceed millions of dollars per minute.

In today’s global economy, with customers expecting on-demand availability of information, the data center has never been more vital to the ongoing success of an organization. Additionally with the increase in regulatory requirements, and other market drivers, we have seen cost of downtime for some data centers exceed millions of dollars per minute.

Perhaps most challenging is the impact high density computing is having on the data center. Power and cooling still remain the two biggest concerns relating to data centers. According to a recent Gartner prediction, over 50% of all data centers will lack sufficient power and cooling by 2008. Complicating matters is the drive towards implementing more efficient “green” technologies.

And now… you are about to build a new data center. What should you do? Should you over design it? With the required capital outlay for many data centers exceeding millions of dollars, how much growth do you build in? How much redundancy should you design?

The biggest key to answering these questions is to determine your cost of downtime and project how that cost of downtime will evolve in the years to come. What applications will you utilize? What operations will you consolidate? What is the impact to your business today? What will be the impact to your business in 3 years and 5 years? These are tough questions to quantify, but without doing so, it is like playing darts blindfolded. It is very hard to see the target.

Regardless of how much money you want to spend and how redundant you want your data center to be, perhaps one of the most important aspects of any new design is scalability. Perhaps the best definition of scalability can be found on investorwords.com. They define scalability as “the potential for a business or an aspect of a business to continue to function effectively as its size increases.” The operative word is effectively. Scalability directly impacts reliability. As resources get taxed and stretched, the potential for failure increases exponentially.

So how can you apply that definition of scalability to your data center? There are several aspects to consider.

Power

Just a few years ago, a robust data center was one that was built to handle 50-75 watts per square foot of power. Today, we see that number growing in some cases to 200-300 watts per square foot and in some facilities have seen it higher than that. While Intel, AMD, IBM, HP and others are all looking at ways to reduce power consumption, we have not yet seen cresting. While you may not need 250 watts per square foot today, you might in two years. If you do not design your data center architecture to be able to expand power requirements, you could find yourselves not having enough power to either properly operate your equipment or cool that equipment. Inadequate power is one of the leading causes of outages in a data center.

Cooling

As equipment becomes denser adequate cooling becomes more and more imperative. The HVAC system you are installing today may be more than enough for today’s requirements but what about 3 years from now? Will you be able to add more CRACs? Will you be able to add enough to handle the anticipated load? What about your chiller loops? What about cooling tower capacity? Can the architecture handle a 25% or a 50% increase in growth when required? If not, that could present major challenges down the road. Without proper cooling, equipment will fail and outages will cause havoc on your operations.

Maintenance & Operations

One of the most overlooked aspects in data center build-outs is the ongoing maintenance and operations of the facility once it is completed. As you grow the amount of infrastructure and equipment in the facility, so to will grow the required preventative maintenance. Are you designing into your facility the ability to maintain the equipment? With the expectation of “7xForever” availability as the Uptime Institute calls it, you may not have the luxury down the road of having “maintenance windows.”

Without “maintenance windows” that means you have to be able to conduct maintenance on portions of your data center infrastructure without bringing down the entire facility? Does your design allow for that? You might have “maintenance windows” today and think that this isn’t an issue, but will that be the case 3 years from now?

My wife, a 7 year customer of one cellular provider switched to another provider because on the one day she wanted to upgrade her phone and plan, she was told the “system was down for maintenance… could you come back another time?” No she couldn’t. Not with twin babies at home. Not with a limited time that the baby sitter was booked. So instead, she drove down the street and found another provider all to willing to accommodate her on that Saturday afternoon.

Your maintenance has to be as scalable as your equipment.

As for operations, one of the biggest failures companies do today is to have too few staff to handle a given facility. “Oh, we have a Tier III facility… we don’t have to worry about outages.” As you are reading this article, countless data centers have a single point of failure… one facilities manager. Regardless of whether you spent millions on a Tier IV facility or not, if you have just one person that knows the ins and outs of that facility, you have a single point of failure that can leave you vulnerable.

Forget the “what happens if he or she gets hit by a bus.” What if your facilities manager gets offered a job by one of your competitors down the street for more money? How will you continue to maintain and operate your facility? As your requirements grow, so too should the staffing plans to accommodate that growth.

Documentation

As your facility grows what are your plans for documenting that growth? How will you maintain current inventory lists? How will you update your “as-built drawings” and “electrical one-liners?” Without an up to date inventory of your equipment and operations, can you really have an effective disaster recovery plan? Without up-to-date drawings, how will you know that you didn’t overload a breaker or PDU?

Not too long ago a major financial institution was performing some electrical work on the data center that supports its trading desk. The electrician that was doing the work inadvertently knocked out the circuit of one of the firms most active and successful traders. In just 20 minutes the firm lost over $6 Million dollars. This could have been prevented if the electrician was working off of current drawings and following posted methods of procedures. The only way to know for sure, as you are expanding, that your infrastructure is coordinated appropriately, is to maintain current documentation.

But, that is just the tip of the iceberg. How are you going to document the maintenance being performed on your systems? How are you going to track required corrective actions? As your facility grows, so to will the amount of documentation you need to keep on your facility.

Procedures

The final area we will discuss as it relates to the scalability in data centers are procedures. There is no such thing as a reliable data center without documented and tested standard operating procedures (SOPs) for the day to day operations of equipment and methods of procedures (MOPs) for the maintaining of that equipment. You may have documented and tested procedures for your facility as it is built for today. However, what is your game plan for updating and re-testing those procedures as your requirements grow. Procedures are not inherently scalable. Every time you alter the landscape of your data center you need to revisit and retest the procedures in place to maintain and operate your facility.

Scalability = Reliability

In the end, just like there is no such thing as an un-safe reliable data center, so to is there no such thing as a non-scalable reliable facility. Scalability is directly related to resiliency. However, scalability is far more than just architecture or equipment. Be it documentation or ongoing maintenance and operations, the uptime of your facility long term will be directly impacted by how you planned for growth across the entire spectrum of elements impacting resiliency.

For more information, contact Todd Bermont, Midwest Regional Director for Total Site Solutions at tbermont@totalsiteteam.com or visit www.totalsiteteam.com.

Related Articles:
Trackback(0)
Comments (0)Add Comment

Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.

busy

DCJ Digital Magazine

 

What drives a Data Center? Want to know more about Cost vs Efficiency in Data Center Design?

 

To find out and to read more great articles in this issue, CLICK HERE!

 


Register Today!

Get the NEW & IMPROVED DCJ Bi-Weekly eNewsletter! Sign up below!


E-mail Address: