The recent utility failures in India show what can happen when creaking infrastructures with an expansion programme falling behind collide with an unprecedented growth demand curve.  The total grid failure, with some 600 million consumers in the dark for 2-3 days, is something that is hard to envisage here in Northern Europe and, I am sure, in North America.  OK, we have our localised cascade failures and outages for up to 8-10 hours are not unknown – but they are very ‘localised’ compared to 600 million and generally quickly recovered from. 

Against this news we have the data-centre industry where any facility worth its salt has emergency generators, usually diesel powered, and designed for exactly this eventuality – protection from long-term outages.  Our UPS systems give us power fidelity (high quality voltage for the critical load) and have a certain amount of autonomy time built-in – from 3s for some DRUPS through 12-20s for flywheels to several (tens of) minutes for batteries.  It’s probably worth noting here that usually not all the facility load is protected by UPS and the critical cooling systems have to survive the vagaries of the raw utility power – so, in areas of poor power availability, sometimes a long UPS autonomy is of little use if the room becomes too hot before the batteries are depleted or the utility returns or the generators take over the load.

Having said all that we can return to the matter of generators and the continuing surprises that are sprung upon us when news reaches us of ‘data-centre failures’ because the utility fails (or lightning strikes etc) and the gensets fail to protect the load.  I am not sure if it’s just me being sceptical but those types of failures seem to be increasing in frequency and nearly always from ‘cloud’ type of organisations:  is it just that the installed base is growing fast or is it that operators are cutting corners in pursuit of lower cost cloud services?  I suspect it is a blend of these two root-causes as I am sure that I (and most of my electrical chums) can design an electrical infrastructure that is immune to grid outages, even without resorting to a Tier 4 type of architecture.

Then we come to one of my favourite hobby-horses – the rating of gensets – and the arguments that can ensue between independent design engineers and the ‘guidelines’ of the Uptime Institute.  Now, clearly, if the client wants to get certification to a particular Tier Classification from TUI then the designer has no option but to follow TUI guidelines/rules/advice and, on genset rating, that advice has been unwavering: The facility infrastructure should be so designed that the gensets are capable of continuous use (by implication at full-load) whilst the utility should be the ‘economic alternative’ source of power.  I don’t know if it’s just pedantic interpretation of sloppy English or just plain cussedness but, despite many public denials by TUI, I know of a lot of people who assume that this means you should run off the gensets all of the time.  This would be expensive (CapEx & OpEx) and quite daft to do but the concept still causes confusion – actually matched by the similar misunderstanding that to have Tier IV you have to have feeds from two separate utility providers – complete tosh (never part of TUI thinking) and practically impossible in 99.99..9% of all locations in the world.

So what is the problem with continuous genset rating and why does it vex me so much?  We just need to understand how gensets are sold for given duties, why they are different for those duties and what negative impacts that can have on our data-centre designs and budgets.  At the heart of the matter is the commercial diesel engine.  A given capacity (bore, stroke and number of cylinders) can be rated for an output power based on hours-run per year given an acceptable maintenance regime.  If the duty is expected to be light, say 70% average load for no more than a week or two of operation per year (called ‘standby’ duty rating) then the power rating can be higher than for the identical engine used for ‘pedal-to-the-metal’ 100% load for the entire year (generally called ‘prime’ rating).  The kW difference will usually be less than in the 20-25% range.  The same de-rating is never applicable to the alternator as anything more than 20 minutes or so of operation (at any load) is deemed to be continuous.

So, for example in the UK, we have a utility which can be relied upon to give better than 99.95% and where (in urban locations) you will have to wait on average 8 years for outages lasting longer than 6 hours and any outage due to a utility-owned distribution transformer failure will not have you off-line for greater than 36-48 hours.  So, why-oh-why (?) are we encouraged to use oversized gensets capable of continuous duty at full load?

But it generally gets worse than that.  Data centres are renowned the globe over from suffering from partial load and that, combined with redundancy in the design mean that ‘standby’ sets become ‘prime’.  Consider a 4MVA N+1 system comprised of 3x2MVA sets running a data-centre (PUE=2) which is loaded at a typical 60%:  The load is 2.4MVA and each of the gensets carries 0.8MVA, 40% of its capacity.  Even if it was a standby rated set at 2MVA its ‘prime’ rating would be in the order of 1.6MVA – way above the likely loading, even if one set in the group fails to start.  An additional problem, if we need one, is that diesel sets have to be run at higher than 25-30% load of the design load to avoid coking/glazing and increased maintenance costs – so low loads and highly redundant systems are incompatible.  On the bright-side if the gensets are over-rated at least they have some chance of handling the leading power factor IT loads when the UPS is in bypass!

So have the Indian power-cuts changed my mind?  Certainly not.  The current outage only lasted 70-100 hours and the most basic standby rated genset can be relied upon to cover that 2-3 times over in a given year, even at 100% load – which almost never turns up… A redundant standby set array could cope with this once per month although there are places in the world where I would not specify standby-rated sets, but they are few and far between.

The key point is the engines are the same for standby or prime – only the combination of running hours per year makes a difference. 

Unlike in the USA TUI has a weak take-up of accreditation in the UK although, on a rather a positive note I think, they appear to be moving from proscriptive ‘rules’ to more ‘sustainable’ business objectives – albeit still based on the twin pillars of concurrent maintenance and fault-tolerance (or not, as the budget may dictate) but certainly more flexible.  Maybe the ‘shopping list’ days of data-centre design are nearly over?