CERN Accelerating science

A set of hypervisors lost connection to shared storage

 
Date & time of incident: 
Friday, September 14, 2012 - 14:00
Incident Description: 

Another 16 hypervisors within the Service Consolidation Service have lost connection to the shared storage leaving 29 Virtual Machines hanging.

Affected Virtual Machines had already been brought back up. This is an instance of the same incident that occurred yesterday (https://test-static-03.web.cern.ch/service-incident/16-hypervisors-lost-connection-share-storage/13-09-2012).

It has affected one of the hypervisors clusters where the Hyper-V patch which fixes this problem had not yet been applied.

The affected VMs are: bdii210 boinc06 boinc09 boinc10 dashboard48 dashboard50 dashboard61 dssbuild4 indicosearch2 fts501 lxcscweb02 lxjira05 lxtopo3 voatlas234 voatlas236 voatlas241 voatlas261 voatlas276 voatlas282 voatlas287 voatlas299 vmcloudman01 vmcloudman02 vocms164 vocms165 vocms221 vopartner06 wowzauds14 wowzauds17

 

Service Element Affected: 
Multiple Services
Impact: 
Some applications linked to services are unavailable
Status: 
Resolved
Resolution date: 
Fri, Sep 14, 14:30
Expected resolution or Next Update Time: 
Friday, September 14, 2012 - 15:07
Posted by: 
IT-PES
Unit responsible for resolution: 
IT Department