Contributed article by Andrew, Director of Facilities
The servers must always have power, and they must always be kept cool. It’s simple, really…
A well designed data center can handle a multitude of issues, from an interruption in power by the utility company, to a broken air conditioning unit, to a major storm. Proper maintenance practices are crucial for the data center to run smoothly and without interruption.
There are many pieces of equipment that keep a data center up and running; the main switchgear where the power enters the building, the Uninterruptible Power Supply (UPS), the Power Distribution Units (PDUs), Chillers, Air Handlers, Generators, Fire Suppression System, building management software, the list goes on and on. Each piece of equipment must be maintained so that in the event of an emergency the system can perform as designed, and the servers don’t go down.
The Uninterruptible Power Supply (UPS) is the backbone of the facility. The UPS conditions the power from the utility and will bridge the gap in the event of an interruption in electrical service until the backup generators can take over and power the site. Depending on the size of the data center, there can be literally hundreds of batteries that will keep those servers online in the event of a utility interruption. Those batteries must be checked and maintained on a regular basis in order to be sure they are up to the task should they be needed. At NationalNet we perform quarterly maintenance on UPS batteries, and twice annual inspection of the UPS circuitry.
Backup generators are run once per week in order to ensure they are ready to start and accept the load when necessary. Block heaters are on 24 hours a day to keep the engines “warm,” and maintenance is performed twice yearly by a certified Caterpillar technician. Fuel samples are also taken twice per year in order to maintain fuel quality. In addition, every six months it is beneficial to transition the data center over to generator power in order to give the generators a “workout,” and once per year it is necessary to perform load bank testing. Load bank testing entails connecting the generators to equipment that will draw 100% of their available power. Generators require a few hours of running at 100% of their capacity or they don’t function at their highest level.
Water Chillers and Computer Room Air Handlers (CRAH units) receive quarterly maintenance, from cleaning the coils, to changing air filters and fan belts, to checking electrical connections. Water samples are taken and sent to a laboratory to ensure that no corrosive chemicals have made their way into the system. A cool environment is crucial to maintaining a data center’s uptime and availability.
The Fire Suppression System must be tested for proper functionality along with all smoke detectors. Building management software must be upgraded and backed up without affecting the site, and all electrical breakers must be tested with infrared equipment to be sure they aren’t overheating .
There is a lot that goes into maintaining a high availability data center, and every system is crucial. In a well run data center there is always something to do, but as long as the servers have power, and they are kept cool, everything is okay. It’s simple, really…