There was a serious DB contention for CASTORPUBLIC, the service was severely degraded. First diagnostics point to a massive load from several public users staging files in a tight loop with rates close to 200Hz/user (some of the files were in the process of being recalled from tape). This induced a lot of pressure on the DB side, making it slower and slower as the requests were piling-up, to the point that this also impacted internal CASTOR processes that got stuck on the DB side. This made the situation worse - resulting in the stager response time being extraordinarily high.
Follow-up: the offending users has been banned and experiment computing responsibles have been informed. The internal CASTOR processes were regenerated and freed from the DB side (mainly tape-related workflows). The situation is coming back to normality but we are closely monitoring the evolution until everything is fully restored and understood.