CERN Accelerating science

16 hypervisors lost connection to share storage

 
Date & time of incident: 
Thursday, September 13, 2012 - 11:00
Incident Description: 

16 hypervisors within the Service Consolidation Service have lost connection to the shared storage leaving over 100 Virtual Machines hanging.

Service managers are bringing back affected Virtual Machines and investigating why the hypervisors lost access to the shared storage.

The Virtual Machines affected by this incident includes:

atlaslogbook atlaslogbookt batchmon02 bukowiecvm01 bukowiecvm02 cmscert cmscollstat cmsdasvm1 cmsdasvm2 cmsdasvm3 cmsdasvm4 cmsdasvm5 cmsdasvm6 cmslogbook cmslogbookt cmsperfpubvm cmsperfvm cmsperfvmdev cmspubperfvm cmssdtdev01 cmstrko2ovm cmstrko2ovm02 creamtest001 dashboard24 dashboard25 dashboard26 dashboard27 dashboard28 dashboard29 dashboard30 dashboard31 evoportal fts301 fts302 gridmsg003 historydqmweb lcggenser3 lemon2build03 lemon2build04 lfcatlas01 lfclhcbro01 lfclhcbro02 lfclhcbrw01 lfclhcbrw02 lfclhcbrw03 lfcshared01 lfcshared02 lxcvm001 lxcvm002 lxcvm003 lxcvmfs01 lxdev61 lxdev62 lxdev63 lxdev64 lxlahey03 lxlic02 lxlic06 lxlic07 lxsvn01 mcwin01 musclefit osadmin01 pcnds01 pcudsdev2 phedex-web-dev sindesdev02 sindesdev03 slsdev02 smtarch02 tsmmsdev01 vmeos01 voatlas150 voatlas166 voatlas167 voatlas168 voatlas169 voatlas170 voatlas171 voatlas172 voatlas173 voatlas195 voatlas196 voatlas197 voatlas198 voatlas199 voatlas200 voatlas201 voatlas202 voatlas203 voatlas204 voatlas209 vocms01 vocms07 vocms12 vocms129 vocms130 vocms131 vocms132 vocms133 vocms134 vocms135 vocms137 vocms152 vocms153 vocms154 voms304 voms306 voms308 vona6101 vpcgiordano

All Virtual Machines are now up.

 

 

Service Element Affected: 
Multiple Services
Impact: 
Some applications linked to services are unavailable
Status: 
Resolved
Resolution date: 
Thu, Sep 13, 14:00
Posted by: 
IT-PES
Unit responsible for resolution: 
IT Department