CERN Accelerating science

TN Disconnection Test

 
Description: 

For March 27th, the annual TN Disconnection Test will take place from 10:00 to 13:30 (a second test shall be scheduled around May 2014 to verify the lessons learned from the 2013 Disco Test).

Reason for this intervention: 

The CERN network is a two-tier architecture with the office network serving as a General Purpose Network (GPN) for office computers, laptops and computing services hosted in the CERN Computer Centre, and the so-called Technical Network intended to host control systems used for infrastructure monitoring/control and the operation of accelerators. Both networks are interconnected by a redundant-pair of routers, the so-called TN-GPN gate. Parallel to the TN exist a series of similar “Experiment Networks” (ENs) hosting the operational equipment for e.g. LHC experiments.

  • Objective: Understand the extent to which control systems connected to the TN depend on external services (GPN, CERN CC) and in how far these systems are able to run autonomously in case the GPN is not available. Given past experience, the dependency should be rather low, but hidden dependencies might exist. Once identified, they should be mitigated before the second TN Disco Test in 2014.
  • Impact: The TN Disconnection Test will inhibit any connection between the TN and the GPN for the full length of the test. Some systems might need a restart to establish connections again after the test. The connections between the TN and the LHC experiments are not affected. However, the automatic updates of the whole CERN centrally managed network infrastructure will be halted and switched to manual. This implies that network updates will only happened irregularly.
  • Schedule: For March 27th, the annual TN Disconnection Test is supposed to take place from 10:30 to 13:00. Additional time might be needed to re-establish the functioning of systems which were inhibited during this test.
  • Coordination: The test will be coordinated by the CERN CSO from the CCC with members of BE/OP and BE/CO monitoring a range of control systems and their functioning during this test. IT/CS will be present to overview network manipulations. Additional parties are warmly welcomed to join and monitor their systems, too.
  • Procedure: The disconnection procedure is step-wise: initially, so-called LANDB sets of computer services hosted on the GPN and serving the TN will be removed one after the other from being visible to the TN. Most of these sets provide auxiliary Computing Service to the TN. Once done, the TN-GPN gates will be completely blocked. At each step, system behavior should be check for operation, starting of applications, booting of servers,  logging in of operators, and installation of systems. DIAMON and LASER (alarms) might be able to give additional information of dependencies. Details are given above.
  • Fall-Back: The aforementioned disconnection procedure is gradual. In the case severe problems are detected and communicated to the CCC, the test can and will be halted. Eventually, fully GPN-TN connectivity can be re-established within minutes.
  • Notification: These proceedings have already presented to the FOM (2013/01/15), the LS1 coordination meeting (2013/01/31), the TIOC (2013/02/06), the IEFC (2013/02/08), and the LMC (2013/02/13). All members of the CNIC User Exchange (2013/01/31), LCSP (2013/01/28), and IT GLM/ISM meetings (2013/01/31) as well as all owners of “TN Trusted” or “TN Exposed” devices (apart from those managed by the IT department) and all LANDB set owners (2013/02/06) have been informed. A “Note d’Intervention” (IT-DI-CSO-2013-001) has been distributed on 2013/03/05. No objections have been received so far (as of 2013/03/22).

 

Effective from date: 
Wednesday, March 27, 2013 - 10:00
End date: 
Wednesday, March 27, 2013 - 13:30
Service Element Affected: 
Multiple Services
Impact on service: 
Service will be unavailable during the intervention
Posted by: 
IT-DI
Unit responsible for resolution: 
CERN Computer Security