CERN Accelerating science

71 Virtual Machines rebooted following storage incident

 
Date & time of incident: 
Thursday, March 7, 2013 - 13:30
Incident Description: 

A storage cell for virtual machines stopped responding for a few minutes around 13:30, causing 71 virtual machines to crash on IO errors and unexpectedly reboot.

All the affected machines are back online as of 14:10

The complete list of affected machines follows:

bdii211
boinc01
boinc04
boinc06
boinc08
boinc10
boinc11
boinc12
boinc13
c2adm02
dashboard47
dashboard48
dashboard49
dashboard63
dashboard70
dashboard74
dashboard76
dssbuild4
gridmsg007
gridmsg010
indico-wk3
lfcatlasro02
lxargus04
lxjira03
lxjira04
lxjira06
lxjira07
lxjira08
lxjira10
lxlic09
lxlic11
lxservb07
pccis04
pccis85
procdev11
procdev12
procdev13
procdev14
slc6hepos
vmcdbweb1
voatlas210
voatlas231
voatlas239
voatlas240
voatlas270
voatlas276
voatlas284
voatlas286
voatlas289
voatlas299
voatlas301
voatlas303
voatlas308
voatlas328
vocms225
voms303
vopartner05
webafs18
wowzauds14
wowzauds15
wowzauds16
wowzauds17
xrdfed02
xrdfed04
xrdfed06
xrdfed08
xrdfed10
xrdfed12
xrdfed14
xrdfed18
xrdfed20

Service Element Affected: 
Multiple Services
Impact: 
Some applications linked to services are unavailable
Status: 
Resolved
Resolution date: 
Thu, Mar 7, 14:05
Posted by: 
IT-PES
Unit responsible for resolution: 
IT Department