UPS Question

Home » CentOS » UPS Question
CentOS 14 Comments

so I have a bunch of servers at work that are on DUAL UPS’s (1 per each of 2 power supplies). Its mostly HP gear, but also some Sun and IBM.
Ideally I’d like the system to not go into shutdown unless BOTH UPS’s that a particular box is plugged into (there’s actually 4 UPS, 2 per each of 2 racks). Has anyone ever configured anything like that with NUT or whatever?

Right now, I’m running without any UPS monitoring software, as the whole building is on a generator that comes on 60 seconds after a power failure, so my 7kVA(!) UPS’s don’t see much use, and also I just installled these new racks and UPS’s and haven’t really had the time to sit down and figure it all out.

14 thoughts on - UPS Question

  • John R Pierce wrote:
    Nope. I’d think to have something that checked both UPSes, and if both of them were showing no line power for over x sec (60? 70?), start a shutdown.

    Here, as I just mentioned, it’s only a second or two, so I have to edit
    /etc/apcupsd/apccontrol to change SHUTDOWN to SHUTDOWN=/dev/null

    mark

  • I do this with APC UPSes and apcupsd. I find it well worth the effort as I have a (separate) program that checks the UPS state every 30 seconds and emails me if there is a notable state change. This has caught multiple power issues coming from the mains.

    I show how I do this as a sub-section in a larger tutorial, if you are curious;

    https://alteeve.ca/w/AN!Cluster_Tutorial_2#Setting_Up_UPS_Monitoring

    I had to make some changes to the apcupsd init.d script to read/use multiple UPS config files. I also disabled the automatic power off features as, by default, it would have triggered a shutdown as soon as one UPS dropped. This is useless if the other UPS is fine, of course. That separate scanner program handles the smarts for deciding when to shut down based on the strongest of the available UPSes… but that’s not really covered here.

    hth

  • John R Pierce wrote:

    Damn, hit , and forgot to ask: feel free to answer offlist, but what make and model are the 7KVAs, and what ballpark price were they? We need two of something like that (at least 6KVA) for the SGI supercomputer….

    mark

  • they are HP R7000, made by Eaton. 4U 7KVA rackmount. they need 208
    40A(?) service. ethernet interface is standard. They support up to 4
    extended runtime modules (4U each). at 80% usage, HP claims 6 minutes with the base unit and 57 minutes with 4 ERM’s (3U each).

    These things are 165 lbs, they were a gut buster to load into the racks, heh. after the 1st one, I removed the batteries from the other 3, and it was way easier. They use 18 12V 45W UPS batteries, in two packs of
    9 each.

    quick specs, http://www8.hp.com/h20195/v2/GetDocument.aspx?docname

  • The only thing that stopped me from doing that was: you can only use each UPS up to a half of its spec’ed current drain.

    Thinking in line of: having 2 UPSes for rack, how do you distribute power from them to machines which luckily have 2 power supplies each (for redundancy). Having 1 PS wired to one UPS and 2nd PS to second UPS will save you if one of UPSes failed (and does not provide any output AC). But in this case you will need UPS as powerful (current drain wise) as to power the whole rack. So you are paying for reliability by using UPSes of double capacity. Did I not miss anything? I guess I almost did: if you have 4 UPSes for 2 racks you can wire them so that if only one of 4 fails, you will have increase in draw of all UPSes only by 1/3…

    I decided _we_ are not that rich anyway…

    Its mostly HP gear, but also some Sun and IBM.

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • John R Pierce wrote:
    We need

    Thanks, muchly. I keep hoping to be closer to $3k than $5k (we’re a US
    federal agency, non-DoD, and our budget has been negative for years…).

    I take the sled out when I mount a SmartUPS 3000 – that’s about 43lbs of batteries, and about the same of the unit. For what you’ve got… I go over to the data center and borrow The Answer: a Blue Genie
    <http://vistamation.com/products/lifts/hand-winch-lift-trucks/> Let me tell you, you need to rack or unrack a blade enclosure, or a RAID box….

    mark

  • In our case, that is what we do. We’re an HA shop, first and foremost, so *everything* has to be redundant. So yes, each UPS, on it’s own, has to be able to hold up all equipment for the minimum specified runtime. That said, it’s not really a “waste”, because when there is a total poewr out, we get twice the minimum hold-up time, which comes in very handy at times.

    For example, we had a client who runs Windows VMs on out system. There was a major power out event that we knew was going to outlast the backup power (they’re a manufacturing facility, so if the machinery isn’t up, then the servers aren’t doing much). After we determined that we had to shut everything down, we found that someone had not turned of MS’s automatic updates. So one server decided that it was a great time to install updates during a critical outage.

    Thanks to having the extended runtime, we were able to shed some load and hold up the host node and the server long enough for windows to finish all of it’s OS updates. Obviously, this should never have happened in the first place, but it’s an example of how extra runtime can come in super helpful.

  • same as any other HA solution. active/standby servers, you’re only using the compute power of the active server, the standby server is sitting there waiting for a catastrophe. redundant LAN or SAN
    switches, etc etc.

  • our big DC’s probably have those, but they are in other states or countries. the small DC at my office where my lab is located, we get to use brute force and muscle.

    last big RAID boxes I racked, I pulled all the drives and PSUs out first, and it was easy.

  • Exactly… HA has to cover the full stack.

    That said, most of our installs have loads on both nodes (multiple VMs, some on node 1, some on node 2). Like the UPSes though, each node itself has to have sufficient capacity to run everything. :)

  • Which asks for one more piece of equipment (hopefully you are not on a high floor…): diesel generator. That kicks in if the power doesn’t return after some short outage. (I was almost sure you mention it somewhere closer to the end…)

    Valeri

    ++++++++++++++++++++++++++++++++++++++++
    Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
    ++++++++++++++++++++++++++++++++++++++++

  • Some of our clients have that, but not all of them. Even the ones that do, though, can have trouble. A different recent client had a major incident in their power distribution room (fire and/or explosion). They had enough fuel for 6~8 hours of operation. They couldn’t get more fuel in time and ended up having to shut down their production facility until they could arrange a more long-term solution.

    You can go to great lengths to protect yourself, but there will always be limits. It’s a question of where your paranoia and budgets meet.

  • Absolutely true.

    An HA platform is not beautiful when there is nothing left to add. It is beautiful when there is nothing left to take away.

    The minimum complexity needed to have no single points of failure is inherently high. Doing everything possible to minimize that complexity is a worthy goal.