My new BGP book: 'Internet Routing with BGP' by Iljitsch van Beijnum BGPexpert My BGP book from 2002: 'BGP' by Iljitsch van Beijnum

Home · BGP Expert Test · What is BGP? · BGP Vendors · Links · Archives · Books · My New BGP Book

BGP (advertisement)

Table of contents (for this page):

inet⁶ consult

If you could use some help with BGP, have a look at my business web site:

BGP routing courses

Several times a year I teach two training courses, one about BGP and one about IPv6. The BGP course is half theory and half hands-on practice, and so is the new IPv6 routing course. Previously, we did an IPv6 course without a hands-on part. Several times a year I teach a hands-on BGP training course in association with NL-ix. The course consists of a theory part in the morning and a practical part in the afternoon where the participants implement several assignments on a virtual router running the Free Range Routing software. (Based on Zebra/Quagga, configured much like a Cisco router.)

The next dates are:

  • Friday, 3 June 2024
Go to the NL-ix website for more information and to sign up. The location will be Zoetermeer, the Netherlands.

Interdomain Routing & IPv6 News

  • Search for:
  • BGP training course at NL-ix in Zoetermeer June 3 (posted 2024-05-27)

    NL-ix brings back its famous BGP (Border Gateway Protocol) course. The course is scheduled to happen on 3rd June. It will be conducted by Iljitsch van Beijnum. In one day (09.30 - 16.00) course participants will learn the internals of BGP protocol. This will help them with a much better understanding of their current BGP work.

    In English. Hopelijk binnenkort ook weer in het Nederlands.

    Meer informatie en inschrijven

  • Upgrading Fiber To The Home to terabit speeds (posted 2024-04-09)

    Last week, Jaap van Till asked me if BGP would be capable of supporting the terabit class interconnectivity that he foresees we’ll need in the future, possibly due to the rise of artificial intelligence. He explains his reasoning in the blog post What Link speeds will we need for AI, where he quotes VAN TILL’s CONJECTURE:

    The network connection Wide Area access speed will grow in time until it matches the internal device BUS speed of the more and more complex processors and datastores.

    And then concludes that 14 Tbps external links will be required in 2039. Today I can get 4 Gbps where I live. So that means a 70% speed increase per year.

    Let’s first get that BGP question out of the way: I see no problems. 25 years ago I ran BGP over 64 and 128 kbps links without trouble. Six orders of magnitude later, BGP is still fine, and there is no reason to believe that even faster speeds will be a problem, just as long as the packet loss rates remain minimal.

    But what would terabit class network connectivity at home look like?

    Actually, I think we have all the parts to build this today. With Wavelength Devision Multiplexing (WDM), it’s possible to transmit multiple data streams through a single fiber by using slightly different wavelengths/frequencies of infrared laser light. Coarse WDM (CWDM) is relatively cheap and appropriate over shorter distances, with 18 wavelengths standardized over high performance fiber. (Fewer over most existing fiber.) For long distances, dense WDM (DWDM) can use as many as 160 wavelengths over a single fiber pair.

    Bandwidth per wavelength is now 100 or 200 Gbps, and expected to increase in the future. So anything between say 10 x 100 Gbps = 1 Tbps and the 20 Tbps used by modern seacables should be possible. The catch is of course the cost.

    The difficulty is with the transmitting side, as this requires a tuned laser per wavelength. On the receiving side, the wavelengths can be split using a prism and hit a set of wideband receivers. As someone who is definitely not in the business of building this equipment, it seems to me that a system with one or a small number of transmitters, a passive optical bus, and a large(r) set of receivers is definitely something that could enjoy radical performance vs cost improvements over time. And it fits perfectly with the most efficient / high speed way to connect homes to the internet that we have today: PON (passive optical network). So just add additional wavelengths to existing PON installations to gain more bandwidth in the downstream direction.

    However, now we have a new challenge: TCP/IP is not a good fit for sending the massive data streams that would make good use of such a network. The problem is that TCP tries to adjust its end-to-end data transmission rate to available bandwidth. This means it needs to wait for acknowledgments from the receiving side to know it can increase its transmission rate, maintain it, or slow the transmission rate down. Downloading 100 MB over a 1 Tbps link takes less than a millisecond. But even over PON, the round-trip-time is a millisecond or two. This means that the bottleneck is the number of round trip TCP requires to reach that full terabit speed. Even if that’s an extremely unrealistic 10 RTTs, that means the total transmission time is now 11 ms, effectively only using a tenth of the available bandwidth.

    So we need to overhaul TCP/IP for the super high speed stuff and instead use something more like circuit switching / time division multiplexing / token passing. Yes, everything old is new again! So for instance reserve ten 100 μs timeslots and transmit ten 10 MB “megapackets”.

    So I think all of this is highly doable!

    Well, there is the slight challenge of how to pipe all that bandwidth into your laptop without connecting/disconnecting that fiber all the time. Maybe use eight Thunderbolt 5 interfaces in parallel to reach 960 Gbps?

  • → Enforcing First AS in BGP (posted 2023-10-08)

    The BGP RFCs state that external BGP peers should insert their own AS into the AS PATH advertised to eBGP peers. Some peers strip their AS, generally for commercial gain. Juniper and Cisco have opposite default behaviors for handling this. Make sure you set bgp enforce-first-as on Juniper routers. Caveats apply.

    The annoying part here is that you want to disable this check for internet exchange route servers, but keep it enabled for everything else for security reasons. But that's not universally possible, as on some routers this is a global setting, rather than a per-neighbor one.

    Read the whole article

  • BGP handling of obscure errors (posted 2023-10-02)

    I read Ben Cartwright Cox' (extensive) blog post Grave flaws in BGP Error handling and then saw his talk about the same topic at NLNOG on Youtube.

    The necessary background

    In addition to the "well-known" BGP path attributes that we all know (because the RFC says we must) and love (because they make the internet work), it's also possible define new attributes to provide new functionality. These can be "transitive" attributes, which means that a BGP router that doesn't recognize them propagates them to its BGP neighbors unchanged.

    The ability to create new optional transitive attributes has allowed us to run BGP version 4 for three decades without having to bump the version number because we had to make backward-incompatible changes that would make adoption all but impossible.

    For instance, 32-bit autonomous system numbers were added as the 16-bit BGP AS numbers started to run out. In addition to the well-known (mandatory) 16-bit AS path, an optional 32-bit AS path was added. If a router in the middle didn't understand the 32-bit AS path, it would update the 16-bit AS path and propagate the 32-bit AS path unchanged.

    The next 32-bit capable BGP router can then add back the AS numbers from the 16-bit path that are missing from the 32-bit path, and 32-bit AS numbers work even if routers in the middle don't understand them. (They just see "23456".)

    Of course you can read all about BGP attributes in my book Internet Routing with BGP. It's even in the sample chapters! (Page 15.)

    The error handling issue

    In Ben's blog post, he talks about a Brazilian network included a malformed version of a still experimental attribute. All the big routers in the core of the internet don't run experiments, so they just saw an attribute they didn't recognize, and propagated it as per the transitive setting. Eventually BGP updates with the broken attribute arrived at routers that did understand the attribute, but saw that it was broken.

    So as per the original BGP spec, they tore down the BGP session towards the router that sent them the broken attribute. And then, after a short delay, tried to set up a new BGP session towards that neighboring router. Only to encounter the same error again and tearing down the BGP session again. And so on.

    Which is probably not wat you want. Which is nicely explained in RFC 7606, published in 2015, which suggests to treat such errors as if the neighboring router had asked to withdraw the route containing the offending path attribute. So if a neighbor tells me prefixes, and are reachable through them, and has a broken attribute, I just act as if my neighbor had told me that is not reachable through them. But I don't bring down the BGP session so and remain reachable through the neighbor in question.

    Ben seems to be rather annoyed that many router vendors don't implement the RFC 7606 behavior, implement it but don't enable it by default, and/or don't have a bug bounty program to reward security researchers for pointing out these deficiencies. He spent a good amount of time evaluating different implementations and then "fuzzing" attributes to see what would happen, So that's somewhat understandable. Here is his score card from his presentation slides:

    My take

    I agree that the RFC 7606 handling by default is what you want. I also agree that changing a default here, something router vendors loathe to do, shouldn't be problematic.

    However, these are pretty obscure errors. This is not an internet extinction level issue.

    For my own network, I would strongly prefer a mechanism to turn off handling of these often rather frivolous new attributes. Both to avoid being bitten by buggy implementations elsewhere, but also to avoid inflating BGP messages. As BGP updates propagate, the AS paths (the 16- and 32-bit versions) increase in length, so an update that was just under the limit at some point will exceed the maximum size of 4096 bytes at some point, and then definitely bad things will happen.

    However, it's important that new transitive attributes aren't filtered out wholesale, as that would make it impossible to add new features to BGP. I'm not sure if there is a workable way to put a stop to frivolous BGP path attributes being injected into the global routing system while at the same time not robbing BGP of its forward compatibility with future new innovations.

Archive of all articles - RSS feed

My Books: "BGP" and "Running IPv6"

On this page you can find more information about my book "BGP". Or you can jump immediately to chapter 6, "Traffic Engineering", (approx. 150kB) that O'Reilly has put online as a sample chapter. Information about the Japanese translation can be found here.

More information about my second book, "Running IPv6", is available here.

BGP Security

BGP has some security holes. This sounds very bad, and of course it isn't good, but don't be overly alarmed. There are basically two problems: sessions can be hijacked, and it is possible to inject incorrect information into the BGP tables for someone who can either hijack a session or someone who has a legitimate BGP session.

Session hijacking is hard to do for someone who can't see the TCP sequence number for the TCP session the BGP protocol runs over, and if there are good anti-spoofing filters it is even impossible. And of course using the TCP MD5 password option (RFC 2385) makes all of this nearly impossible even for someone who can sniff the BGP traffic.

Nearly all ISPs filter BGP information from customers, so in most cases it isn't possible to successfully inject false information. However, filtering on peering sessions between ISPs isn't as widespread, although some networks do this. A rogue ISP could do some real damage here.

There are now two efforts underway to better secure BGP:

  • Secure BGP (S-BGP) is developed by Bolt, Beranek and Newman (BBN). It has been around for several years and there is a proof-of-concept implementation. S-BGP tries to secure all aspects of the BGP protocol, and subsequently needs several signature checks for each BGP update, making the protocol relatively heavy-weight. You can see my earlier rants on S-BGP at the top of this page. Note that I'm not as anti-S-BGP as I used to be any more, although I still think implementing the protocol will be expensive because routers will need lots of extra memory (up to four times as much) and CPU power (possibly dedicated crypto hardware) and this aspect deserves some serious attention.

    Secure BGP (S-BGP) index at BBN.

  • Secure Origin BGP (soBGP) has surfaced fairly recently and hails from Cisco. There are no implementations so far. soBGP mainly focusses on securing the relationship between prefixes and the source AS number, and doesn't need as many computationally expensive checks as S-BGP. However, the protocol can easily be expanded to perform more checks.

    draft-ng-sobgp-bgp-extensions-00.txt (main soBGP draft)
    draft-white-sobgp-bgp-extensions-00.txt (deployment considerations)

    (If the links don't work, the drafts have expired; you'll have to use a search engine to find them.)

There is now also a different approach to increasing BGP security using an "Interdomain Routing Validation" service that works independent from the BGP protocol itself. See what I wrote about this in interdomain routing news on this site, or jump immediately to the Working Around BGP: An Incremental Approach to Improving Security and Accuracy of Interdomain Routing paper.

The IETF RPSEC (routing protocol security) working group is active in this area.

What is is a website dedicated to Internet routing issues. What we want is for packets to find their way from one end of the globe to another, and make the jobs of the people that make this happen a little easier.

Your host is Iljitsch van Beijnum. Feedback, comments, link requests... everything is welcome. You can read more about me here or email me at iljitsch@bgpexpert. or follow iljitsch on Twitter.

Ok, but what is BGP?

Have a look at the "what is BGP" page. There is also a list of BGP and interdomain routing terms on this page.

BGP and Multihoming

If you are not an ISP, your main reason to be interested in BGP will probably be to multihome. By connecting to two or more ISPs at the same time, you are "multihomed" and you no longer have to depend on a single ISP for your network connectivity.

This sounds simple enough, but as always, there is a catch. For regular customers, it's the Internet Service Provider who makes sure the rest of the Internet knows where packets have to be sent to reach their customer. If you are multihomed, you can't let your ISP do this, because then you would have to depend on a single ISP again. This is where the BGP protocol comes in: this is the protocol used to carry this information from ISP to ISP. By announcing reachability information for your network to two ISPs, you can make sure everybody still knows how to reach you if one of those ISPs has an outage.

Want to know more? Read A Look at Multihoming and BGP, an article about multihoming I wrote for the O'Reilly Network.

For those of you interested in multihoming in IPv6 (which is pretty much impossible at the moment), have a look at the "IPv6 multihoming solutions" page.

Are you a BGP expert? Take the test to find out!

These questions are somewhat Cisco-centric. We now also have another set of questions and answers for self-study purposes.

You are visiting over IPv4. Your address is