Large AT&T outage / no more IGPs (posted 2002-10-15)
On August 28th, AT&T had an outage in Chicago that affected a large part of their network. It took them two hours to fix this, and after this they released a fairly detailed description of what happened: network statements in the OSPF configuration of their backbone routers had been deleted by accident.
AT&T was praised by some on the NANOG list for their openness, but others were puzzled how a problem like this could have such wide spread repercussions. This evolved into a discussion about the merits of interior routing protocols. Alex Yuriev brought up the point that when IGPs fail, they do so in a very bad way, his conclusion being it's better to run without any. This led to some "static routing is stupid" remarks. However, it is possible to run a large network without an IGP and not rely on static routes. This should work as follows:
That way, you never talk (I)BGP with a router you're not directly connected to, so you don't need loopback routes to find BGP peers. Because of the next-hop-self on every session, you don't need "redistribute connected" either so you've eliminated the need for an IGP. Since the MED is increased at each hop, it functions exactly the same way as the OSPF or IS-IS cost and the shortest path is preferred.