Data Center Network Issues on November 7, 2019
On Thursday, November 7, 2019, at around 10am, a problem in Cornell's Rhodes Hall Data Center disrupted all services that are housed in its environment or rely on it. Network issues lasted until the source of the problem was discovered and removed around 6:20pm that evening. A few services that use the data center required additional attention and were all back to operating normally by the next morning.
The problem was caused accidentally by a campus customer of the CIT "Co-located Servers" service doing maintenance work on their own equipment in the data center that caused a "network loop." Problems of this type are difficult to identify and diagnose, because they present inconsistent symptoms and they often cause many other network functions to fail intermittently.
Since no systems work was being done by CIT at that time that might have caused the problem, CIT staff worked to try to identify what could have. Several possible causes were eliminated before analysis of data that had been looping around the network for more than an hour pointed to the source. The location causing the issue was shut down and the network immediately stabilized.
CIT is in the process of improving prevention and detection of these types of issues. This will help limit the effect of disruptions like these and decrease the time it takes to fix issues caused by them. CIT is also tightening access to co-location facilities to avoid unexpected impacts to data center infrastructure, operations, and services.
If you have any questions, please contact the IT Service Desk.