Network Outage on Campus
BY RAY STILL
An unplanned network outage could be a college’s worst nightmare. Students would be unable to do their online work, teachers would be cut off from communicating with their students, and the main college website would be down for hours on end. Luckily, The Evergreen State College’s Network Services noticed and resolved the bug in their system before this hypothetical mess could become a reality.
“Network Services discovered an issue with the primary switch,” said Dylan Houston, a manager of the computing center at Evergreen. “At home, that would be your router you’d use to get online.”
The issue affected the primary switch’s configuration files. The configuration files tell the primary switch how to connect to the Evergreen website and email accounts and the rest of the internet.
During a reboot of the primary switch without the bug, the primary switch would look for the old configuration files and reconnect with the emails and website, said James Gutholm, the associate director of computing and communications
The bug in the primary switch caused it to lose the configuration file. “So something that caused the primary switch to reboot—it crashing, power spike, whatever—it would start up in a blank state,” Gutholm said. “There would be no connectivity to campus or in campus. It would just stop functioning.”
Rip Heminway, the associate director of Academic Computing, alerted students through email of a planned network outage on February 20, starting at 10:30 p.m. Heminway wrote that the best-case scenario would result in a 15 to 20 minute outage, but there was also a chance that the outage would continue into the next morning.
Network Services planned for the outage to be in the evening because all programs would be over, and most students would be off campus. Network Service’s first plan of action would be to copy and paste a backup configuration file to point the primary switch in the right direction, which would only take a few minutes.
Gutholm said that Network Services had several fallback plans, which ranged from having the network down for an hour to having it last through the next morning if they had to rebuild the system from scratch.
“I never count on anything going to plan,” Gutholm explained. “That is why I have a plan A, plan B, and plan everything goes to you-know-where.”
There were two outages during the night of the reboot—both were four minutes long, 30 minutes apart. “What we were originally planning didn’t work out, but folks adapted on-the-fly and found a new strategy. It was better than hoped for,” Gutholm said.
In the event of an extended outage, or if there was an unplanned outage, Network Services would have switched from the main Evergreen website to a smaller copy, hosted by Amazon, with a message explaining that systems were down.
Network Services would also contact the University of Washington. “They kind of behave like Comcast does for your home—they do that for the entire state K-20 education network,” Gutholm said. He explained that UW would make the configuration changes so people would get directed to the smaller, Amazon-hosted copy of the Evergreen website when they searched online.
“The purpose of that is for emergency notification—if we had an earthquake or something that really took us out—that is how we could communicate,” Gutholm said.
Gutholm said that all the issues relating to this specific bug have been solved, and that there is no chance that it will cause an unplanned outage in the future. “Now we are going to do some updates in a nice, controlled fashion, now that we have everything back to where we want it to be,” he said.