Date & time of incident:
Wednesday, November 23, 2011 - 21:45
Post date:
Wednesday, November 23, 2011 - 22:16
Incident Description:
23/11 at 21:45: ADCR database is unavailable because of underlying storage issue.
The problem has been solved at 23:45PM, see all updates below for more details on this incident.
Service Element Affected:
DB & App Platform for Accelerators
Any other affected service(s):
Affected services: ADCR_DQ2, ADCR_DQ2_LOCATION, ADCR_DQ2_TRACER, ADCR_PANDA, ADCR_PANDAMON, ADCR_PRODSYS, ADCR_ACCOUNTING, ADCR_AGIS, ADCR_AMI
Impact:
Service is unavailable
Status:
Resolved
Resolution date:
Thu, Nov 24, 08:55
Posted by:
IT-DB
Unit responsible for resolution:
IT Department
Updates
Problem solved at 11:45PM:
Full history:
Around 9:45PM ADCR database went down and was restarted on a standby hardware around 10:30PM. The rootcause of the problem was a multi disk failure (a disk failure happed during underlying storage system was running a rebalancing after previous disk failure).
Unfortunately, after emergency failover to standby hardware, ADCR database reported a corrupted block which affects one table of PANDA application. Therefore all services except Panda have been started around 11PM.
After some investigation & consultation with application experts a workaround to skip reading corrupted block has been used. Remaining services has been started around 23:45PM
Update at 8h56:
The adcr_lfc service, which was not available,is accessible again.
Update at 8h47:
At least one of the services is not available, we are looking at this now.
Update at 8:46:
At least one service is not reachable, we are looking at this.