Some services hosted on machines located at SafeHost were affected by a partial power failure, including the IT Service Status Board.
Most services were restored within one hour.
MORE DETAILS:
The source cause is a power failure on the enclosure encPB0701 which contains the following machines:
DRUPALFE01
DRUPALFE02
DRUPALFE03
DRUPALFE04
This caused the crash of everything connected on the same Power Bar which includes IP8 network switch (see the full list of machines below).
Everything was restarted.
- Physical servers affected:
LEMONSRV01
LEMONSRV02
LXREMEDY10
LXSERV20
DRUPALFE01 to DRUPALFE08
ITRAC1265 to ITRAC1270
- Known services affected:
IT Status board
CERN IT web page
LEMON
some itrac12
possibly others
- Virtual machines affected:
DRUPALBE0102
DRUPALBE0304
DRUPALBEND
DRUPALDB
DRUPALDEV
DRUPALFE01V
DRUPALFE02V
DRUPALFE03V
DRUPALFE04V
DRUPALFE05V
DRUPALFE06V
DRUPALFE07V
DRUPALFE08V
DRUPALLOG
DRUPALPROD
more details in the post mortem here: https://twiki.cern.ch/twiki/bin/view/CCOperations/PartialPowerFailureSH120726