• Structure 08

     Following CloudCamp last week was Structure 08, a new conference from GigaOM centered on the hyperscale infrastructure powering the clouds.  There is plenty of coverage of the event already out there so I won't waste digital ink here on everything that was discussed.   But the key takeaway: infrastructure matters.  A lot.  That was the message from speaker after speaker in the keynotes and panel discussions.   Why it matters:

    1. TCO-advantaged infrastructure is a competitive differentiator for cloud operators.

    2.  Infrastructure and application comprise one large machine and must be designed and managed together.  Infrastructure as an afterthought is a recipe for disaster.

    This event brought some well-deserved attention to the hard work going on in the boiler room of Web 2.0.  Hats off to the team at GigaOM and all the participants.

  • CloudCamp

    This week I attended CloudCamp, which came up "out of the blue" from a lively discussion forum on cloud computing.  I think it could be best described as a flashmob conference - conceived and organized in less than a month by Reuven Cohen and others from the forum.  It followed the "Unconference" format which was essentially a face to face instantiation of the user-led online forum.   Attendees included a healthy mix of developers, young companies, serial entrepreneurs, veterans of Web 1.0 and corporate IT managers.  There were a few VC's slowly circling as well, as at Structure 08 the following day.  What struck me the most about this gathering was the grassroots enthusiasm, bordering on electric but with requisite techgeek nonchalance.   Maybe it was the open bar but there seemed to be a genuine excitement borne out of being on the cusp of something big.   I overheard a lot of "back in the day" talk in the audience that compared the cloud computing discussion in the industry today with early web developer communities, or some of the first gatherings around virtualization. 

    The breakout sessions varied from the 100,000 ft discussions like "what is cloud computing" and "social impacts of the cloud" to meatier sessions on security and architectures.   The conversations underscored that Cloud computing is fast evolving in what I would call consumer and public clouds: web-centric businesses utilizing services like Amazon EC2, internet search and social media sites as examples.   Lots to figure out yet on how to harness the power of cloud computing  for Enterprise IT.    For those who manage diverse platforms and navigate complex business and regulatory environments there is much to consider.    Participants wrestled with the implications to security, privacy and availability, but it was discussions on application interoperability and mobility (between clouds or between internal applications and cloud-based services) that underscored what early days we're in.  

    By virtue of the 250+ people that congregated on such short notice, CloudCamp was a notable endorsement of the diverse and intense interest in cloud computing.    The populist Unconference format eliminated a lot of the breathless marketing banter that often accompanies this topic and allowed for very down to earth conversations.   Kudo's to Reuven for the concept and Dave Nielsen who proctored the session.   Another gathering is planned for July in London- watch the cloudcamp site for details.

  • The Cloud Physical Plant

    As we begin to look further, I now think that the idea of completing the physical plant as part of the layered model (shown below), may not be the best use of our time and our efforts are better served by treating it as an independent model simply referenced from this stack.

    image

    As such I have decided to fill out the complete physical plant model (shown here) for this layer and let it serve as our discussion vehicle as we move forward.

    image

    We have discussed the overall power system to a great degree but still need to look at approaches that can be used for internal power distribution. This model is build around the concept of A.C. distribution. There have been many arguments describing the benefits of D.C. as a distribution vehicle and there is a well known study at Lawrence –Berkley National Labs (http://whitepapers.silicon.com/publisher/39038243/lawrence-berkeley-national-laboratory.htm) describing the advantages of D.C. While an excellent study, it cites the advantages of D.C. with high efficiency D.C. power supplies as compared to A.C. with poor efficiency A.C. power supplies. With A.C. power supplies in the 86% to 92% today, these advantages are eliminated and A.C. solutions are generally less expensive. In all there are a few simple rules one should remember;

    · Keep transformations to a minimum

    · Make emergency power support components operate in a “by pass” mode where they do not contribute to loss and reduce overall efficiency during normal operation

    · Keep distribution voltages as high as safely and cost effectively possible

    · Avoid needless items in the power path that contribute to efficiency loss

    · Test all distribution advantage claims against your specific model and make sure you have not missed anything before making revolutionary decisions

    These will help you decide for yourself which is the best approach.

    Let’s now turn our focus to cooling. One of the key principles on which this scheme is built is the idea of containment. Generally speaking, hot aisle (or hot air) containment provides some distinct advantages because it tends to reduce the overall area where the hot temperatures will exist. If you are getting the most out of your cooling dollar, the exhaust air will be pretty warm. In fact, if your inlet air is about 30’C (85’ F), you can expect this temperature to be 45’C+ (about 115’ to 120’ F) so keeping this contained makes the most sense. Now, this is not conventional cooling and is built around using outside air or “free” cooling as much as possible. This is based on the idea that if your exhaust air is hotter than your outside air, you are better off starting with the cooler air source than expending the energy to “recool” the exhaust air. (The effectiveness of this will vary in different geographies and you need a wet bulb temperature of less than 85’F for this to work effectively.) There are a couple of things that must be considered: There must be filtration to clean the air to a point where it is usable (and you will need sensory equipment to detect clogged filters) and there are times for which the outside air will become unusable. During such times (in winter or in the presence of pollutants), the inside air must be re-circulated and used for cooling. In the figure, you will see the presence of a cooled water system and a heat exchanger for “re-cooling” inside air. You can also see the usage of evaporative cooling (or air-side economizers) and water side economizers. Using the proper combination of these approaches (again based on your geography and particular model), you can achieve PUE(s) lower than 1.10.

    From here, we will begin looking at what I think is the optimum approach for the hardware so stay tuned.

    The complete model is shown below In some models (and certainly the most inexpensive, Tier 0) this is simply none. This essentially relies on the utility provider and the general utility practices (GUP) for power without any additional support means. I don’t know anyone who is doing this yet, but there are a couple of folks that are looking at it. (Note: their availability is managed at a higher level even with duplicate data centers (geographically separated) that can take over and deliver their service if one of the data centers goes down … and yes there may be a reduction in their service, but it will stay up.) If your business model allows you to do this, you can see some very interesting advantages. Your centers can be very low frills which can save you a lot of money and the duplication can not only provide the availability but also serve to provide a data backup function, fulfilling another major SLA. (I believe this will become a standard practice in the future as building blocks become even less expensive …More on this later.)

    (A good reference for all this is the Tier definition from the Uptime institute which can be found at http://uptimeinstitute.org.)

    The next step in availability is type N (Tier 1), N+1 (Tier 2), etc. (This is probably a rehash of Power Availability 101 for some, but it is kind of nice to see it all put together.) Here, there is a backup source (equal to what is considered the critical power part of N … and this is probably not everything) for power in the event that the utilities go away. Most folks in this space seem to be providing some form of N+1 (Tier 2 - a backup source and 1 additional source in the event that some of their primary backup fails). You may be able, depending on your circumstance and your utility provider, to create a model for the maximum duration of backup power that doesn’t require you to cover a prolonged outage. (This can save a lot of money especially if you can shift load to an alternate locations as I mentioned above.) Your planning should look something like this:

    1. Determine the amount of power needed to back up only the essential systems. (BTW, don’t forget to isolate this from the general power which will likely include some things you don’t want to pay to backup, but don’t forget about cooling. We’ll discuss that in the next installment.) This is “Pcritical KVA” or the critical load and you will need to outfit this amount of alternate power. If you have the space, I favor a simple diesel or LP generator for the backup power (GenerationDiesel). (Also note, this may be the longest lead time item you need to procure as you are building a data center and in some cases approaches 48 weeks.) Depending on the size (usually bigger is better), these retail for about $150 to $200 / KVA. (Note: this is just a rule of thumbs for the materials and does not include installation cost. Plus, the actual prices may vary from your suppliers.) For example, if your Pcritical is 3MVA, you could use four 1000KVA generators (M x UPSKVA). This covers your Pcritical and provide an additional unit giving you N+1. For each generator, you will need a transfer switch which will move your load from the utility power to your generators. As a rule of thumbs, these retail for about $20 - 30 per KVA. This example would cost you somewhere around $800K for the generators and $30K for switching equipment. This will total up to somewhere around $830K for materials not including installation.
    Generator sets require two things that you need to consider. These are the storage of fuel on site which will determine the “TBUP (Hrs)” or the duration of backup power will last without intervention; (With today’s environmental rules, this is not something that can be taken lightly) and monthly maintenance. Once a month or so the generators will need to be checked to ensure they are properly working and are setup correctly for the current season. There is a lot of automation here, but it is pretty expensive and some of it can be avoided by simply having a “trusted” human :o) perform these regular checks.

    2. Now let’s focus on the UPS. Some form of UPS is required to hold the load up while the generators are becoming operational. This is the sequence of events occurs something like this: It can take about 4.5 seconds for a generator to crank up and become stable. (This is one of the main reasons for monthly maintenance. ) If it fails, there may be a pause / purge of about 3 seconds followed by an additional 4.5 second start up time. This is just under 15 seconds to bring them up. Most folks add some margin here but this is one place where time is money and 30 seconds for “TUPS (Sec)” for the duration of power provided by ups seem appropriate. By this point, it you are not up, you have a bigger problem. If you are planning to use a conventional UPS with VRLA type batteries (UPSVRLA), you should expect to pay something around $50to $75 per KVA for each 30 seconds of backup time. For the example configuration of 3MVA, if you use 500KVA UPS systems, you will need 6 of them (M x UPSKVA) and at $75 / KVA, this will cost you somewhere around $225K for the UPS systems alone. (Note: This is a very soft estimate and not all sizes and durations are available. So you will have to work in the constraints of you vender when you are sizing/estimated the UPS. All estimates are just to give you an idea about some of the cost. These will vary as technology changes and with specific vendors.)
    There is a lot of other material required to hook this up. I would add 10% to the total to give you are good rule of thumbs for materials cost. At this point, our stack and model look like the following with configuration guidelines to come.
    There are a couple of other good resources I would refer to at this point and both come from the uptime institute. They are: Cost Model: Dollars per kW plus Dollars per Square Foot of Computer Floor and A Simple Model for Determining True Total Cost of Ownership for Data Centers. Note: These are created along the lines of conventional data center, but they do contain some good information.

     

    image

    Figure 1 – Cloud Computing Layered Model – Power

     

    image

    Figure 2 – High Level Schematic w/ Power

  • Within Layer 1 - Power

    As we begin to look at power, the first thing we need to do is to determine the desired availability.  In some models (and certainly the most inexpensive, Tier 0) there simply isn't any.  This essentially relies on the utility provider and the general utility practices (GUP) for power without any additional support means.  I don't know anyone who is doing this yet, but there are a couple of folks that are looking at it.  (Note: their availability is managed at a higher level even  with duplicate data centers (geographically separated) that can take over and deliver their service if one of the data centers goes down ... and yes there may be a reduction in their service, but it will stay up.)  If your business model allows you to do this, you can see some very interesting advantages.  Your centers can be very low frills which can save you a lot of money and the duplication can not only provide the availability but also serve to provide a data backup function, fulfilling another major objective. (I believe this will become a standard practice in the future as building blocks become even less expensive ...More on this later.)

    (A good reference for all this is the Tier definition from the Uptime institute which can be found at http://uptimeinstitute.org/.)

    The next step in availability is type  N (Tier 1), N+1 (Tier 2), etc.  (This is probably a rehash of Power Availability 101 for some, but it is kind of nice to see it all put together.)  Here, there is a backup source (equal to what is considered the critical power part of N ... and this is probably not everything) for power in the event that the utilities go away.  Most folks in this space seem to be providing some form of N+1 (Tier 2 - a backup source and 1 additional source in the event that some of their primary backup fails).  You may be able, depending on your circumstance and your utility provider, to create a model for the maximum duration of backup power that doesn't require you to cover a prolonged outage. (This can save a lot of money especially if you can shift load to an alternate locations as I mentioned above.)  Your planning should look something like this: 

    • 1. Determine the amount of power needed to back up only the essential systems. (BTW, don't forget to isolate this from the general power which will likely include some things you don't want to pay to backup, but don't forget about cooling. We'll discuss that in the next installment.) This is "Pcritical KVA" or the critical load and you will need to outfit this amount of alternate power. If you have the space, I favor a simple diesel or LP generator for the backup power (GenerationDiesel). (Also note, this may be the longest lead time item you need to procure as you are building a data center and in some cases approaches 48 weeks.) Depending on the size (usually bigger is better), these retail for about $150 to $200 / KVA. (Note: this is just a rule of thumb for the materials and does not include installation cost. Plus, the actual prices may vary from your suppliers.) For example, if your Pcritical is 3MVA, you could use four 1000KVA generators (M x UPSKVA). This covers your Pcritical and provide an additional unit giving you N+1. For each generator, you will need a transfer switch which will move your load from the utility power to your generators. As a rule of thumb, these retail for about $20 - 30 per KVA. This example would cost you somewhere around $800K for the generators and $30K for switching equipment (not including installation).

      Generator sets require two things that you need to consider. These are the storage of fuel on site which will determine the "TBUP (Hrs)" or the duration of backup power will last without intervention; (With today's environmental rules, this is not something that can be taken lightly) and monthly maintenance. Once a month or so, the generators will need to be checked to ensure they are properly working and are setup correctly for the current season. There is a lot of automation here, but it is pretty expensive and some of it can be avoided by simply having a "trusted" human :o) perform these regular checks.

    • 2. Now let's focus on the UPS. Some form of UPS is required to hold the load up while the generators are becoming operational. This is the sequence of events occurs something like this: It can take about 4.5 seconds for a generator to crank up and become stable. (This is one of the main reasons for monthly maintenance. ) If it fails, there may be a pause / purge of about 3 seconds followed by an additional 4.5 second start up time. This is just under 15 seconds to bring them up. Most folks add some margin here but this is one place where time is money and 30 seconds for "TUPS (Sec)" for the duration of power provided by ups seem appropriate. By this point, it you are not up, you have a bigger problem. If you are planning to use a conventional UPS with VRLA type batteries (UPSVRLA), you should expect to pay something around $50to $75 per KVA for each 30 seconds of backup time. For the example configuration of 3MVA, if you use 500KVA UPS systems, you will need 6 of them (M x UPSKVA) and at $75 / KVA, this will cost you somewhere around $225K for the UPS systems alone. (Note: This is a very soft estimate and not all sizes and durations are available. So you will have to work in the constraints of you vender when you are sizing/estimated the UPS. All estimates are just to give you an idea about some of the cost. These will vary as technology changes and with specific vendors.)

      There is a lot of other material required to hook this up. I would add 10% to the total to give you are good rule of thumbs for materials cost. At this point, our stack and model look like the following with configuration guidelines to come.

      There are a couple of other good resources I would refer to at this point and both come from the uptime institute. They are: Cost Model: Dollars per kW plus Dollars per Square Foot of Computer Floor and A Simple Model for Determining True Total Cost of Ownership for Data Centers. Note: These are created along the lines of conventional data center, but they do contain some good information.

     

  • XS23 Cloud Server

    There has been some recent press around some of the equipment we’ve developed in our cloud computing group. The core of our business is essentially a consulting and design service and developing new products for customers is a big part of the fun. Because these aren’t mainstream PowerEdge systems, we don’t get the chance to show them off as much as we’d like. Our group has been talking for some time about “optimized designs” for cloud and hyperscale computing without showing what that can really mean, so it’s time to unveil something that’s come out of the lab.  Pictured here is one of our favorites: the XS23.

    image

    XS23 front – twelve 3.5” SAS or SATA drives; 3 per server

    This product was designed for a customer that needed maximum compute density, a healthy amount of local disk and, of course, lowest power draw possible. Our architecture team threw all that in the blender and out came a 2U standard rack mount chassis that houses four dual-socket servers and twelve 3.5” hot plug drives.

    image

    XS23 exploded view: two dual-socket servers mounted in chassis bottom; two in a mezzanine above. Industry standard rack-mount chassis.

    Density of this type is certainly not unheard of (half depth or twin 1U’s), but by going to a 2U chassis we were able to fit it with larger, more efficient fans and stack 3 rows of full 3.5” drives across the front. So, even with a 25% higher density than general purpose blades, it provides three local spindles of 3.5” SAS/SATA disk to each server. Of course there are tradeoffs. This was expressly designed for an environment with high node failure tolerance - a cloud application. By designing out a lot of the capabilities that weren’t required (like redundant power) we were able to deliver the performance and power profile required. Efficiencies are gained by shared resources - as seen in a lot of general purpose designs available today. We think the key to designing the perfect cloud server is knowing where to stop and also what not to build in. This is a function of each customer’s unique design goals. Applications truly capable of foregoing high availability in hardware are somewhat rare, but customers in this space have it – as well as a laser focus on their business levers. So in this case we took the problem statement and made the tradeoffs to yield highest efficiency and density within the performance parameters of the application.

    It’s important for me to emphasize that the XS23 is not generally available. This system is qualified and supported for only a handful of specific customer applications and locations; it’s not completely productized to bear a PowerEdge badge. I hope you’ll watch this space for more unique designs and the discussion on cloud taxonomy and architecture that Jimmy's leading.

  • Layer 1....

    I’d like to continue on our journey and build out the model that we have described starting at the bottom of my model and moving towards the top. The first thing we should do is change the name of layer 1. Some have pointed out to me that while the facilities is an important element, this block is going to cover a lot more than just the facilities and we should change its name. I’d like to propose physical plant (which is a very familiar term to facilities folks) and see if this encompasses what lies ahead.

    image

    Figure 1 – Cloud Computing Layered Model

    The first aspect to consider as part of this layer is what I am going to call ”macroscopic containment” or MC for short. Most folks would simply refer to this as the building, but I want to make a distinction here as there are many functions we can get from the MC.

    · The simplest form of MC is of course NONE. This is the case for equipment where the cabinetry is designed to sit out in the open. We see this in the telecom and perhaps the military industries, but not in this space. (although there are some interesting discussions ahead and a debate where “container” based solutions should go.)

    · Next we find a very simple MC or what I am going to refer to as temporary devices. The best example of this is a tent. Not very practical in most cases (in fact it almost sounds like a joke), but I know there are people considering them for areas where all they need is a bit of protection from the elements and some light physical security.

    · The next level is a fairly major transition to an actual building. This is probably where we are going to see most cloud installations and is what I think will ultimately prove to be most cost effective. I will refer to this as a utility building which is best described as a simple shell with a concrete floor (no raised floor). It provides controlled separation from the IT environment and outside environment. (I’ve seen these for about $38/sq ft. depending on the way you want the building finished-out.)

    · The final MC type is more along the lines of conventional data centers with raised floors and the works. This provides a very clean and well controlled solution and is probably overkill for most cloud environments. A reasonable rule of thumbs for this type of MC is about $500 per sq ft.

    We may want to add something describing this as owned, leased, or co-located space, but I have omitted this for now. I have also added MC to the schematic model we are going to build, but it isn’t much to look at. We’ll have to get a bit further in the definition for it to start having meaning.

    As always, your comments are welcomed. Next up, Utilities!

  • Welcome to the party!

    Today marked the announcement of another entry into the foray of "new" systems designed for cloud computing.   I make light of "new" as this is a space Dell has been serving for over a year now.    And along the way we've found that the unique needs of hyperscale customers demand a hands-on (and often very discreet) co-development approach.   Power and space savings vs. general purpose servers in the magnitude of those quoted in the press today are really just the ticket to entry into these environments.   I'm glad IBM shares Dell's appreciation for that.  

    One element of today's announcement I'd like to call into question is what is being presented to customers as "entirely new" - things like door panel cooling, half-depth servers and proprietary racks.    Who are these "innovations" really benefitting when they're not built on industry standards in the end? Is it the customer or the system provider's bottom line? Cooling is an incredibly complex topic - that heat has to be rejected somewhere and there are no silver bullets.   The best solutions are often rooted in the basics - hot/cold air containment, higher return temps etc... and we have found that a lot can be done even in hyperscale cloud computing environments without adding a lot of unnecessary complexity.

    From a marketecture standpoint I have to give a tip of the hat to the Blue Cloud initiative although I don't sense tangible benefits for any customers yet.   Leadership is delivering.  The top 5 search engines (in the U.S. market) are Google, Yahoo, Microsoft, AOL and ASK.  It's widely known that Google builds their systems in-house.  Of the remaining top five, three have worked with Dell in the past year to co-develop their servers with our Data Center Solutions team.   Not a bad start considering the dawn of customized, build-to-order cloud computing servers just came today.... 

     

  • Cloud Computing Model

    Probably the best next step for this discussion is to begin to build a top to bottom model of Cloud Computing.  I think there are about 12 major pieces to it so this is going to take a while.  As I mentioned earlier, “Cloud computing”, I believe, may in fact become the basis for most modern IT services in the next few years.  We also put forth this definition with which most folks seem to agree……“Cloud computing” -  packaging of computing resources in a manner that will provide lower acquisition cost of hardware and in a way that provides a set of optimized services to the end user via the Internet in the most cost effective, operationally efficient means possible.   So I took a stab at the model for this which is shown here:

    Capture

    At this point it is certainly OK to disagree! …. In fact, I have found myself arguing with myself about it already. :o)  So we’ll pause here and let folks take this in.  Then we will start layer by layer to make sure is correct.  My hope is not only can we build an agreeable model at the technical level, but a financial model from which we can get TCO and other information.  Feel free to comment…

     

  • Infrastructure challenges for cloud platforms

    One of the topic areas we'd like to talk about is the impact of power and cooling trends on hyperscale operations.  As the size of scale-out platforms grows ever larger, server designs continue to drive for increased density.   In this first post to our power and cooling section, Drew Schulke of the DCS Services team takes a look at some of the factors impacting organizations that house high-density systems in a co-location facility.  If you house a large compute pool based on blades or other high-density solutions in a colo facility please take a look at Drew's post and tell us what you're seeing.

  • Welcome to In the Clouds

    Hi, I’m Forrest Norrod - General Manager of the Data Center Solutions division here at Dell. We are very excited to launch this blog on the topic of cloud computing. In just one week we will celebrate the one year anniversary of the launch of our team – which is dedicated to the needs of those operating some of the largest computing platforms in the world. I hope this blog can become a forum for our customers, Dell’s technologists and those from across the industry to gather to talk about this new age of computing.

    Since we are still getting questions on what cloud computing is and what sort of conversations we’re hoping to have here, here's a vlog where I explain.

    <a href="http://media.dellone2one.com/dell/March2008/Forrest_Norrod_cloud_vlog.flv"><img src="http://direct2dell.com/photos/videos/images/51110/300x225.aspx" border = "0" width="300" height="225"></a><br /><a href = "http://media.dellone2one.com/dell/March2008/Forrest_Norrod_cloud_vlog.flv">View Video</a><br />Format: flv<br />Duration: 2:33

    Format: flv
    Duration: 2:33
    Downloads
    WMV  MP4  OGG

More Posts Next page »