Date & time of incident:
Friday, April 27, 2012 - 16:21
Post date:
Friday, April 27, 2012 - 16:29
Incident Description:
When batch worker nodes run out of memory, a process that kills the biggest or most rapidly growing process is initiated. This happens in order to protect the worker node.
In such a case, the owner of the process is being notified by e-mail. On some batch nodes the offending process did not exit. This resulted in the user being notified over and over again.
The hosts affected have been identified and paused, the issue is under investigation.
They will be made available again as soon as the root issue is fixed.
Service Element Affected:
Batch Service
Impact:
Service is degraded
Status:
Resolved
Resolution date:
Fri, Apr 27, 19:00
Posted by:
IT-PES
Unit responsible for resolution:
IT Department
Updates
Affected nodes are out of
Affected nodes are out of production since Friday evening. User are no longer affected by the issue