IT Service Status Board

Main menu

Storage connection problems affecting Virtual Machines hosted in SafeHost

Date & time of incident:

Friday, November 9, 2012 - 13:30

Post date:

Friday, November 9, 2012 - 14:32

Incident Description:

Intermitent storage connection problems are affecting Virtual Machines hosted in SafeHost, which become randomly unavailable for a while then may become responsive again.

The list of potentially affected VMs follows:

acron01
cmstrko2ovm03
gridmsg103
gridmsg107
gridmsg111
isfadmin03
lemon2build011
lemon2build012
lemon2build02
lfcatlasmig1
lxadm14
lxcvs12
lxcvsfs8
lxlic03
lxsvn02
lxtnadm03
lxvoadm03
lxzero02
pcitfio32vm
pcvm7gregory
pslinux01
sindesdev01
snowmidsrv01
srmmon03
vm2acarneir
vmacarneir
vmefazendawin7
vmitdi02
vmlgoguey
voatlas129
voatlas131
voatlas132
voatlas133
voatlas134
voatlas143
voatlas144
voatlas151
voatlas152
voatlas153
voatlas155
voatlas156
voatlas161
voatlas266
voatlas277
vocms124
vocms126
vocms127
vocms136
vocms156
vocms16
vocms160
vocms161
vocms211
vocms212
vocms219
vocms227
volcd03
webafs10
webafs15
webafs16
wod05-002

Service Element Affected:

Multiple Services

Impact:

Some applications linked to services are unavailable

Status:

Resolved

Resolution date:

Fri, Nov 9, 16:00

Expected resolution or Next Update Time:

Friday, November 9, 2012 - 16:00

Posted by:

IT-PES

Unit responsible for resolution:

IT Department

Updates

Posted November 9, 2012 - 3:55pm

Incident resolved at 16:00

The following Virtual Machines had to be rebooted around 15:45 in order to succesfully come back online: vocms160, sindesdev01

The following explanation for this issue was given by IT/CS:

The LCG router at Safehost seemed to be dropping traffic for random hosts connecting from the GPN, while the LCG traffic was not affected. We’ve restablished the connectivity with the GPN by applying a workaround but we don’t understand yet what triggers the problem. We will follow this case with the manufacturer. We’ll probably have to schedule an upgrade of the router firmware.

www.cern.ch

CERN Accelerating science

IT Service Status Board

Main menu

Storage connection problems affecting Virtual Machines hosted in SafeHost

Updates

Incident resolved at 16:00