Telstra is expected to fully restore its services in south-west Victoria by the end of this week after the telco’s boss, David Thodey, promised a thorough investigation into the fire that took out the Warrnambool exchange in late November.
A suitably contrite Thodey was at pains this week to point out that he and his team understood the gravity of the situation and the restoration of services was of paramount importance.
"In fact, it would normally take two years to build an exchange and our incredible team is aiming to have completely restored the exchange in two weeks.
Thodey’s compliment to the Telstra team is fully warranted given the nature of what is being attempted. Restoration of any of the Telstra exchanges in such a short time, many of which are more than 50 years old and contain a myriad of old interconnected systems including many kilometres of copper cable, is an excellent outcome.
The Warrnambool episode clearly illustrates the importance of robust continuity plans, and given that the NBN will be backbone of our digital future, NBN Co’s response to the incident warrants a closer look.
The NBN is critical national infrastructure and it must be treated as such. Can business, industry, schools and hospitals survive a two week outage of the network as they become more and more dependent on broadband applications?
In the week after the fire a NBN Co spokesperson told Technology Spectator that it had a contingency plan for such a scenario. However, the statement doesn’t, and shouldn’t, instil the necessary confidence.
"We have put in place extensive practices and processes to handle a network outage of this type, and to recover services as quickly as possible. As you know these are very rare events, but contingencies are planned.”
The last part of this statement is the easiest to dispel so it will be analysed first.
The fire in the Telstra Warrnambool exchange was not caused by a natural disaster so Telstra cannot claim a force majeure situation. Fires can and do occur in large network installations and for this reason anti-fire systems are commonplace in data centres that house critical information systems for government, business and organisations.
Will key NBN locations be protected by anti-fire systems? I certainly hope so, but we have nothing official to base this presumption on.
Not so rare after all
The statement by the NBN Co spokesperson that "these are very rare events” relies upon our short collective memories to provide a subtly misleading impression of the facts.
Melbournians will remember the chaos on the roads when CityLink’s network failed preventing control of critical infrastructure within the tunnels forcing CityLink to close the Burnley and Domain tunnels.
A quick Google search for Telstra ADSL outages returns 228,000 results. A quick review of some of the results should remind us that these "rare events” are in fact not rare at all.
In 2001, a Telstra ADSL outage caused by an authentication server failure caused a national ADSL failure for seven hours. Comments on the SLUG email list were not complimentary and a comment by "Alister” is just as relevant today as it was then when he said: "It's not the one dollar but the lost business from having NO internet connection for a good part of Thursday. No chance of a refund on that.”
In 2004, a Telstra software upgrade that went wrong caused a national ADSL outage that lasted for almost a day. The event caused outrage that was reflected by posts on Whirlpool.
In 2006, a Telstra ADSL outage occurred that caused New South Wales customers to lose access to ADSL for almost a day.
In 2010, a Telsta Bigpond server failure caused a four day Bigpond outage that "affected dial-up, cable, ADSL, satellite and Next G wireless services across the east coast and e-mail nationally”.
Network outages due to pests like ants and rats have occurred regularly over the years, and Telstra has a webpage listing some of the more interesting unexpected encounters in the field. Network outages caused by pests can be expected and actions taken to minimise possible outages.
Will the NBN be deployed utilising anti-pest techniques learnt as a result of Telstra’s vast pest experience? Again, we don’t know.
On June 9, 2012 it was reported that the company responsible for the NBN roll-out in the Darling Downs region failed to lay rodent-proof cables in a bid to cut costs. The Federal Member for Groom Ian Macfarlane said "This just fits in with the whole shemozzle of the NBN”.
Practices and processes
The first part of the NBN Co spokesperson’s statement: "We have put in place extensive practices and processes to handle a network outage of this type, and to recover services as quickly as possible”, begs the question, "What practices and processes”?
NBN Co should support their spokesperson’s statement by immediately handing a copy of the NBN Co "failure event practices and processes” to the Australian Communications and Media Authority or to the Department of Broadband, Communications and the Digital Economy, which is conducting an inquiry.
The two-week outage suffered by business around Warrnambool is economically unsustainable and unacceptable in the 21st century. Building the NBN with a cavalier attitude to failure events must be avoided at all costs.
NBN Co should develop a section of the company website that addresses how it intends to provide improved service levels to the Australian public and complete details of how it intends to deal with failure events and anticipated outage times for different types of failure events.
The Communications Minister Stephen Conroy should call NBN Co chief Mike Quigley and instruct NBN Co to carry out a failure event planning exercise in mid-2013. The exercise should include the possibility of the sudden demise of a major point of interconnect in Melbourne or Sydney. NBN Co should then exercise the "failure event practices and processes” that we have been assured have been prepared, and restore service utilising backup infrastructure brought to the location during the exercise. The results of the exercise should be reported fully. Perhaps it’s also worth pondering whether NBN Co should get assistance from defence and other national agencies in the event of a major failure event.
The NBN is vital for Australia’s future and it is therefore important that Australian’s be fully informed regarding every aspect of the network. The Warrnambool exchange fire must serve as a timely motivator for NBN Co to provide more information about the resiliency of the network and how long it would take to restore service in the event of a failure, big or small.
Mark Gregory is a Senior Lecturer in Electrical and Computer Engineering at RMIT University.