Architecture

  • The Cloud Physical Plant

    As we begin to look further, I now think that the idea of completing the physical plant as part of the layered model (shown below), may not be the best use of our time and our efforts are better served by treating it as an independent model simply referenced from this stack.

    image

    As such I have decided to fill out the complete physical plant model (shown here) for this layer and let it serve as our discussion vehicle as we move forward.

    image

    We have discussed the overall power system to a great degree but still need to look at approaches that can be used for internal power distribution. This model is build around the concept of A.C. distribution. There have been many arguments describing the benefits of D.C. as a distribution vehicle and there is a well known study at Lawrence –Berkley National Labs (http://whitepapers.silicon.com/publisher/39038243/lawrence-berkeley-national-laboratory.htm) describing the advantages of D.C. While an excellent study, it cites the advantages of D.C. with high efficiency D.C. power supplies as compared to A.C. with poor efficiency A.C. power supplies. With A.C. power supplies in the 86% to 92% today, these advantages are eliminated and A.C. solutions are generally less expensive. In all there are a few simple rules one should remember;

    · Keep transformations to a minimum

    · Make emergency power support components operate in a “by pass” mode where they do not contribute to loss and reduce overall efficiency during normal operation

    · Keep distribution voltages as high as safely and cost effectively possible

    · Avoid needless items in the power path that contribute to efficiency loss

    · Test all distribution advantage claims against your specific model and make sure you have not missed anything before making revolutionary decisions

    These will help you decide for yourself which is the best approach.

    Let’s now turn our focus to cooling. One of the key principles on which this scheme is built is the idea of containment. Generally speaking, hot aisle (or hot air) containment provides some distinct advantages because it tends to reduce the overall area where the hot temperatures will exist. If you are getting the most out of your cooling dollar, the exhaust air will be pretty warm. In fact, if your inlet air is about 30’C (85’ F), you can expect this temperature to be 45’C+ (about 115’ to 120’ F) so keeping this contained makes the most sense. Now, this is not conventional cooling and is built around using outside air or “free” cooling as much as possible. This is based on the idea that if your exhaust air is hotter than your outside air, you are better off starting with the cooler air source than expending the energy to “recool” the exhaust air. (The effectiveness of this will vary in different geographies and you need a wet bulb temperature of less than 85’F for this to work effectively.) There are a couple of things that must be considered: There must be filtration to clean the air to a point where it is usable (and you will need sensory equipment to detect clogged filters) and there are times for which the outside air will become unusable. During such times (in winter or in the presence of pollutants), the inside air must be re-circulated and used for cooling. In the figure, you will see the presence of a cooled water system and a heat exchanger for “re-cooling” inside air. You can also see the usage of evaporative cooling (or air-side economizers) and water side economizers. Using the proper combination of these approaches (again based on your geography and particular model), you can achieve PUE(s) lower than 1.10.

    From here, we will begin looking at what I think is the optimum approach for the hardware so stay tuned.

    The complete model is shown below In some models (and certainly the most inexpensive, Tier 0) this is simply none. This essentially relies on the utility provider and the general utility practices (GUP) for power without any additional support means. I don’t know anyone who is doing this yet, but there are a couple of folks that are looking at it. (Note: their availability is managed at a higher level even with duplicate data centers (geographically separated) that can take over and deliver their service if one of the data centers goes down … and yes there may be a reduction in their service, but it will stay up.) If your business model allows you to do this, you can see some very interesting advantages. Your centers can be very low frills which can save you a lot of money and the duplication can not only provide the availability but also serve to provide a data backup function, fulfilling another major SLA. (I believe this will become a standard practice in the future as building blocks become even less expensive …More on this later.)

    (A good reference for all this is the Tier definition from the Uptime institute which can be found at http://uptimeinstitute.org.)

    The next step in availability is type N (Tier 1), N+1 (Tier 2), etc. (This is probably a rehash of Power Availability 101 for some, but it is kind of nice to see it all put together.) Here, there is a backup source (equal to what is considered the critical power part of N … and this is probably not everything) for power in the event that the utilities go away. Most folks in this space seem to be providing some form of N+1 (Tier 2 - a backup source and 1 additional source in the event that some of their primary backup fails). You may be able, depending on your circumstance and your utility provider, to create a model for the maximum duration of backup power that doesn’t require you to cover a prolonged outage. (This can save a lot of money especially if you can shift load to an alternate locations as I mentioned above.) Your planning should look something like this:

    1. Determine the amount of power needed to back up only the essential systems. (BTW, don’t forget to isolate this from the general power which will likely include some things you don’t want to pay to backup, but don’t forget about cooling. We’ll discuss that in the next installment.) This is “Pcritical KVA” or the critical load and you will need to outfit this amount of alternate power. If you have the space, I favor a simple diesel or LP generator for the backup power (GenerationDiesel). (Also note, this may be the longest lead time item you need to procure as you are building a data center and in some cases approaches 48 weeks.) Depending on the size (usually bigger is better), these retail for about $150 to $200 / KVA. (Note: this is just a rule of thumbs for the materials and does not include installation cost. Plus, the actual prices may vary from your suppliers.) For example, if your Pcritical is 3MVA, you could use four 1000KVA generators (M x UPSKVA). This covers your Pcritical and provide an additional unit giving you N+1. For each generator, you will need a transfer switch which will move your load from the utility power to your generators. As a rule of thumbs, these retail for about $20 - 30 per KVA. This example would cost you somewhere around $800K for the generators and $30K for switching equipment. This will total up to somewhere around $830K for materials not including installation.
    Generator sets require two things that you need to consider. These are the storage of fuel on site which will determine the “TBUP (Hrs)” or the duration of backup power will last without intervention; (With today’s environmental rules, this is not something that can be taken lightly) and monthly maintenance. Once a month or so the generators will need to be checked to ensure they are properly working and are setup correctly for the current season. There is a lot of automation here, but it is pretty expensive and some of it can be avoided by simply having a “trusted” human :o) perform these regular checks.

    2. Now let’s focus on the UPS. Some form of UPS is required to hold the load up while the generators are becoming operational. This is the sequence of events occurs something like this: It can take about 4.5 seconds for a generator to crank up and become stable. (This is one of the main reasons for monthly maintenance. ) If it fails, there may be a pause / purge of about 3 seconds followed by an additional 4.5 second start up time. This is just under 15 seconds to bring them up. Most folks add some margin here but this is one place where time is money and 30 seconds for “TUPS (Sec)” for the duration of power provided by ups seem appropriate. By this point, it you are not up, you have a bigger problem. If you are planning to use a conventional UPS with VRLA type batteries (UPSVRLA), you should expect to pay something around $50to $75 per KVA for each 30 seconds of backup time. For the example configuration of 3MVA, if you use 500KVA UPS systems, you will need 6 of them (M x UPSKVA) and at $75 / KVA, this will cost you somewhere around $225K for the UPS systems alone. (Note: This is a very soft estimate and not all sizes and durations are available. So you will have to work in the constraints of you vender when you are sizing/estimated the UPS. All estimates are just to give you an idea about some of the cost. These will vary as technology changes and with specific vendors.)
    There is a lot of other material required to hook this up. I would add 10% to the total to give you are good rule of thumbs for materials cost. At this point, our stack and model look like the following with configuration guidelines to come.
    There are a couple of other good resources I would refer to at this point and both come from the uptime institute. They are: Cost Model: Dollars per kW plus Dollars per Square Foot of Computer Floor and A Simple Model for Determining True Total Cost of Ownership for Data Centers. Note: These are created along the lines of conventional data center, but they do contain some good information.

     

    image

    Figure 1 – Cloud Computing Layered Model – Power

     

    image

    Figure 2 – High Level Schematic w/ Power

  • Cloud Computing Model

    Probably the best next step for this discussion is to begin to build a top to bottom model of Cloud Computing.  I think there are about 12 major pieces to it so this is going to take a while.  As I mentioned earlier, “Cloud computing”, I believe, may in fact become the basis for most modern IT services in the next few years.  We also put forth this definition with which most folks seem to agree……“Cloud computing” -  packaging of computing resources in a manner that will provide lower acquisition cost of hardware and in a way that provides a set of optimized services to the end user via the Internet in the most cost effective, operationally efficient means possible.   So I took a stab at the model for this which is shown here:

    Capture

    At this point it is certainly OK to disagree! …. In fact, I have found myself arguing with myself about it already. :o)  So we’ll pause here and let folks take this in.  Then we will start layer by layer to make sure is correct.  My hope is not only can we build an agreeable model at the technical level, but a financial model from which we can get TCO and other information.  Feel free to comment…

     

  • Cloud computing and SaaS

    "What is the difference between cloud computing and software as a service?"

    This is a really good question.  Let’s explore this space for a bit and hopefully we can come to a good answer.  (I am going to attempt to be brief here so please forgive me it this is not an exhaustive study and lacks some appropriate reference.)

    If we look back far enough, we find most of the popular and modern terms describing advanced multi-computing are actually forms of distributed computing, which has been around since the early 1980s. (See A primer on distributed computing) There has been significant hype/spin as well as real advancements that have “clouded” :o) the whole concept (In fact, I am fond of asking; “Please tell me what you mean when you refer to one of these terms, because I can’t figure them out any more!”).  There are many facets that have evolved in distinct ways which represent real value and when I get a chance, I am going to create some type of figure showing these interrelationships.  For now, we’ll stick with the more recent concepts and I will give you my opinion.  (I will say that If you want to know more now, there is a good reference all-be-it a couple of years old from the GGF I would send you to: The Different Faces of IT as a Service)
     
    Most discussion these days involves grid, utility, and cloud computing to which we will add software as a service (SaaS).
    • Grid computing is a fairly all encompassing concept and as you probably know, can be generally defined as:  “a system that uses open, general purpose protocols to federate distributed resources and to deliver nontrivial qualities of service.”  Or in other words, it uses standard “stuff” to make many distinct systems work together in a way that makes them useful.
    • Utility computing or on-demand computing is the idea of taking a set of resources (that may be in a grid) and providing them in a way in which they can be metered.  This idea is much the same as we buy electricity or a common utility today. It usually involves a computing or storage virtualization strategy.
    • Cloud computing is a subset of grid computing (can include utility computing) and as I mentioned in my opening post, is the idea that computing (or storage) is done elsewhere or in the clouds. In this model many machines (Grid) are orchestrated to work together on a common problem.  Resources are applied and managed by the cloud as needed.  (In fact this is a key characteristic of cloud computing.  If manual intervention is required for management or operations, then it probably doesn’t qualify as a cloud.)  Cloud computing provides access to applications written using Web Services and run on these Cloud Services.
    Now let’s add to this discussion the idea of Software as a Service (SaaS).  Usually this means a model where diverse applications are hosted by a provider and users pay to use them.  So I would say the key distinction of SaaS and cloud computing is the service and business model provided as opposed to the architectural mechanism used to deliver it.  In fact, I think it is also fair to say that a cloud computing architecture may be the key/best mechanism for delivering Software as a Service. Let’s look at a couple of today’s trends and see if this all fits.  Probably the best known examples are of course search and mail.  There are several companies that offer both freely, they are available via the web, and they are written using web services.  (There is a growing set of additional capabilities that are becoming available.)  For the most part, these are all free (fee based versions exist).  Based on the scale and ubiquitous service they are able to deliver, it is fair to say that there is a cloud behind them.  The Amazon Elastic Compute Cloud is noteworthy here. It is a virtual farm, allowing folks to host and run “their” diverse applications on Amazon's web services platform.  It represents an excellent example of a business model where a company is providing “Cloud Services” to those who can and are willing to take advantage of them. Software as a Service is the logical next step in evolution.  It is going to be very interesting to see how this motion will emerge. Ideally users will be able to “rent” the application and everything needed to apply them to their business in the form of Software as a Service.At some point, we should explore SaaS as it relates Application Service Providers (ASP) and On-Demand computing, but enough for now.  I welcome your thoughts or comments...
  • More Conversations: Dell Launches Cloud Computing Blog

    For those keeping score, we launched Direct2Dell back in July 2006. IdeaStorm roared onto the scene in February last year. From there, we began expanding into other languages: Direct2Dell Chinese in March 2007, Spanish in May last year, and Norwegian in September, and there will be more in the future. Most recently, our Investor Relations blog called DellShares went live in November 2007.

    From the beginning, the purpose of Direct2Dell has been to educate and to support our customers on a wide variety of topics that they care about. This blog has grown since those early days. And that growth has encouraged more Dell folks to want to have conversations with our customers. Up to now, I've added more categories on Direct2Dell to expand the topics of discussion. That strategy has worked to a point, but now it's time to evolve.

    Starting today, members from our Data Center Solutions (DCS) team will support a group blog called In the Clouds. It will focus on cloud computing and the backend server, storage and architecture required to make it work. If you're not familiar with the concept of cloud computing, think using web-based e-mail from Yahoo, Google or AOL (see link for their slick integration with Silverlight), or uploading videos to YouTube, pictures to Flickr, or microblogging with Twitter. When you do those kinds of things you aren't storing them on your local device.. you're storing them "in the clouds," or to a remote location in the Internet.

    So, why start with Cloud Computing? The short answer is there's a lot happening in this space right now. Take a look at what Adobe's doing with their AIR product (go Twhirl!) that they recently brought to market. Google continues to surge forward with their Google document apps (Spreadsheet Forms and Google Calendar synch are two recent enhancements that rock), and this week at MIX08, Microsoft is rolling out some cool stuff with Silverlight 2.0 and Internet Explorer 8

    What this all means is that we're at the beginning stages of a shift from the model of the past where applications and all the content created for them were stored locally. This shift has the potential to increase the types of Internet-connected devices we use to consume and create content (check out the good discussion Scoble has going about the battle for web-based content on mobile phones).  

    So, what does all this have to do with Dell and the kind of content you can expect to see in the cloud computing blog? These web-based activities require reams of server and storage hardware architected around complex custom networks. As such, these environments differ from traditional server/storage environments. Our DCS team's purpose is to help customers make sense of that complexity—see this PDF, or www.dell.com/cloudcomputing for more context. That's the kind of content you can expect from reading Dell's Cloud Computing blog.

    If this sounds interesting, I encourage you to subscribe to the Cloud Computing RSS feed. If you'd rather access it directly, go here:

    www.direct2dell.com/cloudcomputing