Rogers blames massive outage on error during network update
CBC
Rogers Communications Inc. says an update to its network caused a malfunction that "very quickly" took down telecom services earlier this month, according to a letter sent to Canada's broadcasting regulator that was released on Friday.
The outage, which started early on July 8 and for some customers lingered for days, left millions without cellphone and internet service — prompting questions from the federal government and the Canadian Radio-television and Telecommunications Commission (CRTC).
"An update in our core IP [internet protocol] network … caused our IP routing network to malfunction," the letter read.
The letter, posted on the CRTC website, met the regulator's deadline for Rogers to answer questions about the outage. However it has many redactions where Rogers is believed to have offered more specific details about the problem and its plans to prevent something similar from occurring again.
The CRTC said Rogers submitted two versions of the letter, one unabridged and the other with redactions, and that it released the latter to protect "highly sensitive information" about Rogers' operations.
Among other things, the CRTC had demanded Rogers explain why 911 services went down in some areas, and how it plans to honour CEO Tony Staffieri's promise to proactively credit customers' accounts.
"In order to regain the trust of Canadians, it is important that we provide open answers to the questions that they have about the outage," the Rogers letter read. "That is why when answering the CRTC … Rogers is being as transparent as possible."
Rogers also said it has hired a third party to review and provide insights on what happened.
Officials from Rogers and a slew of other stakeholders are set to appear at a parliamentary committee on Monday in Ottawa to further explain the cause of the outage, and to outline the steps they are taking to make sure it won't happen again.
In its letter, Rogers said coding from the update deleted a routing filter that "allowed for all possible routes to the Internet to pass through the routers," which flooded and overwhelmed the core network, causing it to stop processing internet traffic altogether.
"As a result, the Rogers network lost connectivity to the Internet."
It said many Rogers employees looking to fix the problem were affected and could not connect to the company's IT and network systems. As a result, only those "equipped with emergency SIMs on alternate carriers" could initially triage the outage.
"While every effort was made to prevent and limit the outage, the consequence of the coding change affected the network very quickly," Rogers said.