BGP table hitting 512k limit in older routers (posted 2014-08-19)
It's never a good sign when the regular press reports about BGP-related issues, such as last week: Browsing speeds may slow as net hardware bug bites (BBC). The problem is that the BGP table has started to hit the 512k FIB limit in some older routers. Numerous outages and slowdowns were reported to be caused by this, but it's unclear to which degree that's accurate.
So what's a FIB and why would it be limited to 512k (524288) prefixes?
BGP routers actually have (at least) three different tables where IP address prefixes are stored, along with a next hop address: the BGP RIB, the main routing table / RIB and the FIB. The BGP Routing Information Base collects all information received over BGP that's not immediately filtered out. So if you have two ISPs, they will both send all the prefixes for all networks in the world that are currently reachable—which means that your router has two copies of every prefix, one with a next hop address pointing to ISP A and one with a next hop pointing to ISP B.
For each prefix, BGP then decides which path is better and sends the prefix plus next hop address pointing to either A or B to the main routing table. The main routing table also holds non-BGP routing information. In large networks, all the internal stuff and routes for customers can add up to thousands or tens of thousands of routing table entries.
Finally, a Forwarding Information Base (FIB) is constructed from the main routing table that is used to actually forward packets to the router identified by the next hop address. Some routers use regular RAM to store the FIB, others use a Ternary Content Addressable Memory. RAM sizes are pretty large these days and typically don't have a fixed limit, as it's shared by many processes running on a router. But TCAMs are special memories with a tiny bit of processing power. Basically, you can show a prefix to a TCAM and then the TCAM will tell you the address where that prefix is stored—you don't have to search through the memory one step at a time. This means TCAMs are very fast, but they are also more expensive than RAM and they run fairly hot. So TCAM sizes are limited.
Nothing new under the sun
Cisco 6500 and 7600 modular routers/switches used to have supervisor modules with a TCAM limit of 256k. And then in 2008 the routing table grew to 256k, so people had to upgrade in a hurry. If they bought new supervisor modules that can handle 512k, they got six years of use out of those, hence Geoff Huston's statement that "Nothing in BGP looks like it's melting".
Because different networks have different numbers of internal prefixes and there are also slight differences between the number of prefixes each ISP announces to its customers, different people get bitten by the issue at different times. Also, TCAMs are often partitioned into different parts: one for the IPv4 routing table, one for the IPv6 routing table, one for MPLS, one for filtering... In some cases, simply changing the partitioning is enough to get by for a while. Alternatively, it's always possible to filter BGP prefixes. As Randy Bush says:
❝half the routing table is deagg crap. filter it.❞
The trouble is, you then lose connectivity towards the filtered prefixes, and there is no obvious way to only filter out the prefixes that are unnecessary deaggregation. If your network is non-huge, the solution is to use a default route pointing to your ISP / one of your ISPs as a safety net. What I used to do many years ago when using severely underpowered routers to run BGP is simply filter out all AS paths longer than five ASes from both our ISPs. Then, if one ISP has a long path and one a short path, I'd still have the short path which I'd want to use anyway. If neither had a short path, chances were it was a non-critical prefix far away, so handling it through a possibly non-optimal default route was unlikely to be problematic.
However, large networks don't have anyone they can point a default route to. So they have to have more recent routers, and they pretty much always do. However, it's not unheard of for older network equipment live out its final years in far away corners of big networks, so they could still have minor issues.
Although during my training courses, I always warn people that they should buy routers big enough to hold enough prefixes for some years to come, I really should have been more explicit and posted a warning here on this site. At least Cisco did: The Size of the Internet Global Routing Table and Its Potential Side Effects.
Geoff Huston expects the IPv4 table to hit 1 million in 2019 and recommends buying routers that can handle at least 2 million prefixes. Unfortunately, it's not always obvious how many prefixes a router can handle, especially if the TCAM is used for more than just the IPv4 FIB. So make sure what the limits are before you spend your money. Also, keep an eye on the weekly routing table report so you can take action when the BGP table starts creeping up to your routers' limits.