My book: 'Running IPv6' by Iljitsch van Beijnum BGPexpert My book: 'BGP' by Iljitsch van Beijnum

Home · BGP Expert Test · What is BGP? · BGP Vendors · Links · Archives · Books · My BGP Book

BGP (advertisement)

Table of contents (for this page):

BGP and IPv6 routing courses

Several times a year I teach two training courses, one about BGP and one about IPv6. The BGP course is half theory and half hands-on practice, and so is the new IPv6 routing course. Previously, we did an IPv6 course without a hands-on part.

The courses consists of a theory part in the morning and a practical part in the afternoon where the participants implement several assignments on a Cisco router (in groups of two participants per router).

Dates for upcoming courses in 2015 are:

  • October 5: BGP (probably in Dutch)
  • October 6: IPv6 routing (probably in Dutch)
Go to the NL-ix website to find more information and sign up. The location will be The Hague, Netherlands.

Interdomain Routing & IPv6 News

  • Search for: in news only
  • 5 minutes of BGP instability after leap second (posted 2015-07-06) article 167

    This July 30th, at 23:59:60, a leap second was added to Coordinated Universal Time (UTC). Dyn Research posted the following graph on Twitter that shows there was significant BGP update instability for five minutes after the leap second occurred:

    Unfortunately, it's not clear why this happened. However, leap seconds have triggered all kinds of mishaps in the past. They're basically miniature Y2K problems. Time and time again, software engineers show that they can't be trusted to take corner cases into account properly.

    This does remind me of a situation about a decade ago, where I had a customer that experienced BGP instability every night at the same time. They used Quagga running on Linux machines. We couldn't figure out what the problem was, until we realized that at that very moment, the ntpdate command was run from the cron. ntpdate synchronizes the system clock with an NTP server. As the machine in question had a very poor system clock, this meant that the system's time was adjusted a lot every night, I think a minute or more, but definitely more than 30 seconds.

    Which meant that if Quagga had gotten a BGP keepalive message 8 seconds earlier, it now thought that was 38 seconds ago. If BGP is configured with a hold time of 30 seconds, this means that Quagga now thinks the other side has been quiet for longer than the hold time and it'll tear down the BGP session. This is what happened every night for a bunch of BGP sessions. We solved this by running the NTP daemon continuously, so there was never a big adjustment in system time. (Alternatively, just letting the time drift would also have worked.)

    The minimum BGP hold time is 3 seconds, so adjusting for an (improperly handled) leap second shouldn't be able to make BGP think the hold time for a session is expired. However, there could be bug somewhere else that impacted BGP.

    I'm not sure whether these kinds of issues are a good argument in favor of abandoning leap seconds, as the bugs won't go away, they'll just show up at a less predictable time. But I don't like the current leap second practice, as they're unpredictable, and you can't calculate the time difference in seconds between two dates without taking the entire list of leap seconds into account. I think it would be better to save the leap seconds up and apply them all at the end of a century.

  • IPv6 is faster than IPv4 on US mobile networks (posted 2015-06-17) article 166

    At the NANOG meeting in San Francisco two weeks ago, there was a session on The benefits of deploying IPv6 only. Someone from T-Mobile explained that the latest Windows Mobile and Android support 464XLAT to allow IPv4-only applications to work over IPv6 with NAT64, so those devices now only get IPv6. Other devices only get IPv4, there's no dual stack. At that point, the panelists didn't know yet that Apple is requiring iOS 9 apps to work over IPv6 so those can work through NAT64 without 464XLAT.

    Another interesting data point is the observation by Facebook that IPv6 tends to perform better than IPv4, with the margin being as large as 40%:

    However, why this is is unclear: the RTTs are the same, yet the performance/bandwidth over IPv6 is better. There was some frustration because Apple's implementation of "happy eyeballs" only looks at the RTT to choose between IPv4 and IPv6, and thus lands on IPv4 a good deal of the time and doesn't enjoy the benefits of that better IPv6 performance.

  • → IPv4 Transfers in the RIPE NCC Service Region (posted 2015-05-28) article 165

    Earlier this month, RIPE Labs had a lengthy blog post about transfers of IPv4 addresses within the RIPE region. A lot of addresses went from Romania to Saudi Arabia, but the rest of Europe and the Middle East has been busy, too. However:

    In the subsequent months of January 2015 through to April 2015, levels of transfer were significantly lower. Because the RIPE NCC listing service continues to show strong demand, the lower amounts transferred may well be a sign that the market in the RIPE region is capped by availability; total demand cannot be met by available supplies. This may change after the recently accepted RIPE policy for inter-RIR transfers has been implemented.

    It probably wasn't an accident that two of the sponsors of the RIPE-70 meeting were businesses that facilitate IPv4 address trading.

    Read the whole article

  • RPKI is ready for real-world deployment (posted 2015-04-30) article 164

    For some years now, the Regional Internet Registries have been rolling out RPKI. The Resource Public Key Infrastructure allows holders of IP addresses to authorize an autonomous system to inject those addresses in BGP. (See here for an overview of how RPKI works and more links.)

    I've always thought it would be hard to deploy RPKI in the real world, because it's just way too easy for a certificate or ROA (route origination authorization) to expire. If that then leads to routes becoming invalid and the addresses in question being unreachable, that would be a good example of the cure being worse than the disease.

    Fortunately, that's not the case: RPKI is ready for real-world deployment today.

    The way to deploy RPKI that's suggested in RFC 6483 as well as the relevant Cisco and Juniper documentation is to assign different local preference values to the three possible RPKI states, such as:

    • Valid (RPKI checks out): local preference of 200 (highest)
    • Unknown (no RPKI for this prefix): local preference of 100 (normal)
    • Invalid (RPKI present but doesn't check out): local preference of 50 (lowest)

    So packets will follow a path that is RPKI-validated if available. If not, they follow a path that isn't covered by RPKI if that's available. Only if there's no "valid" or "unknown" paths, the packets will be sent over an "invalid" path that is covered by RPKI, but validation failed. The trouble with this approach is that it still allows for invalid more specific prefixes to hijack traffic. For instance:

    RIPE has a ROA for prefix 193.0.0.0/21 that allows AS 3333 to originate that prefix, with a maximum prefix length of /21. So if AS 4444 originates 193.0.0.0/21, that will result in the following BGP table:

        Network       Next Hop       Metric LocPrf Weight Path
    >*  193.0.0.0/21  19.11.111.244      10    200      0 3333 i
     *                29.249.178.10      10     50      0 4444 i
    

    So effectively, the path through AS 4444 is ignored. However, AS 4444 could also do this:

        Network       Next Hop       Metric LocPrf Weight Path
    >*  193.0.0.0/21  19.11.111.244      10    200      0 3333 i
    >*  193.0.0.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.1.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.2.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.3.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.4.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.5.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.6.0/24  29.249.178.10      10     50      0 4444 i
    >*  193.0.7.0/24  29.249.178.10      10     50      0 4444 i
    

    So even though the path towards the /21 is still routed to AS 3333, the packets flow to AS 4444 because of the longest match first rule. Solution: filter out "invalid" prefixes completely.

    But then, what happens when RIPE forgets to renew their certificate or ROA in time? If their prefix would then revert to "invalid", it would disappear from routing tables everywhere, and RIPE would be unreachable:

        Network       Next Hop       Metric LocPrf Weight Path
    

    In this scenario, it would be very dangerous to filter "invalid" prefixes, as RPKI is still relatively immature and mistakes will happen.

    However, it turns out that the results of expired certificates and ROAs aren't actually problematic. In a post to the NANOG list, Alex Band points out:

    ❝If ARIN (or another other RIR) went offline or signed broken data, all signed prefixes that previously has the RPKI status "Valid", would fall back to the state "Unknown", as if they were never signed in the first place. The state would NOT be "Invalid".❞

    So what would happen is this:

        Network       Next Hop       Metric LocPrf Weight Path
    >*  193.0.0.0/21  19.11.111.244      10    100      0 3333 i
    

    Obviously, in this case the protection against unauthorized origination of the prefixes in question would go away, but in the normal situation where nobody tries to hijack those prefixes, they would still be reachable and a mistake with certificate or ROA expiration wouldn't immediately lead to a network disappearing off of the internet.

    In other words: deploy RPKI today. It doesn't protect against all forms of malicious address hijacking, but it does offer very robust protection against accidental unauthorized route origination, such as the infamous Youtube/Pakistan incident. Also, you can run an RPKI validator locally without the need for your upstream ISPs or peers to do the same. Archives of all articles - RSS feed

My Books: "BGP" and "Running IPv6"

On this page you can find more information about my book "BGP". Or you can jump immediately to chapter 6, "Traffic Engineering", (approx. 150kB) that O'Reilly has put online as a sample chapter. Information about the Japanese translation can be found here.

More information about my second book, "Running IPv6", is available here.

BGP Security

BGP has some security holes. This sounds very bad, and of course it isn't good, but don't be overly alarmed. There are basically two problems: sessions can be hijacked, and it is possible to inject incorrect information into the BGP tables for someone who can either hijack a session or someone who has a legitimate BGP session.

Session hijacking is hard to do for someone who can't see the TCP sequence number for the TCP session the BGP protocol runs over, and if there are good anti-spoofing filters it is even impossible. And of course using the TCP MD5 password option (RFC 2385) makes all of this nearly impossible even for someone who can sniff the BGP traffic.

Nearly all ISPs filter BGP information from customers, so in most cases it isn't possible to successfully inject false information. However, filtering on peering sessions between ISPs isn't as widespread, although some networks do this. A rogue ISP could do some real damage here.

There are now two efforts underway to better secure BGP:

  • Secure BGP (S-BGP) is developed by Bolt, Beranek and Newman (BBN). It has been around for several years and there is a proof-of-concept implementation. S-BGP tries to secure all aspects of the BGP protocol, and subsequently needs several signature checks for each BGP update, making the protocol relatively heavy-weight. You can see my earlier rants on S-BGP at the top of this page. Note that I'm not as anti-S-BGP as I used to be any more, although I still think implementing the protocol will be expensive because routers will need lots of extra memory (up to four times as much) and CPU power (possibly dedicated crypto hardware) and this aspect deserves some serious attention.

    Secure BGP (S-BGP) index at BBN.

  • Secure Origin BGP (soBGP) has surfaced fairly recently and hails from Cisco. There are no implementations so far. soBGP mainly focusses on securing the relationship between prefixes and the source AS number, and doesn't need as many computationally expensive checks as S-BGP. However, the protocol can easily be expanded to perform more checks.

    draft-ng-sobgp-bgp-extensions-00.txt (main soBGP draft)
    draft-white-sobgp-bgp-extensions-00.txt (deployment considerations)

    (If the links don't work, the drafts have expired; you'll have to use a search engine to find them.)

There is now also a different approach to increasing BGP security using an "Interdomain Routing Validation" service that works independent from the BGP protocol itself. See what I wrote about this in interdomain routing news on this site, or jump immediately to the Working Around BGP: An Incremental Approach to Improving Security and Accuracy of Interdomain Routing paper.

The IETF RPSEC (routing protocol security) working group is active in this area.

What is BGPexpert.com?

BGPexpert.com is a website dedicated to Internet routing issues. What we want is for packets to find their way from one end of the globe to another, and make the jobs of the people that make this happen a little easier.

Your host is Iljitsch van Beijnum. Feedback, comments, link requests... everything is welcome. You can read more about me here or email me at iljitsch@bgpexpert. or follow iljitsch on Twitter.

Ok, but what is BGP?

Have a look at the "what is BGP" page. There is also a list of BGP and interdomain routing terms on this page.

BGP and Multihoming

If you are not an ISP, your main reason to be interested in BGP will probably be to multihome. By connecting to two or more ISPs at the same time, you are "multihomed" and you no longer have to depend on a single ISP for your network connectivity.

This sounds simple enough, but as always, there is a catch. For regular customers, it's the Internet Service Provider who makes sure the rest of the Internet knows where packets have to be sent to reach their customer. If you are multihomed, you can't let your ISP do this, because then you would have to depend on a single ISP again. This is where the BGP protocol comes in: this is the protocol used to carry this information from ISP to ISP. By announcing reachability information for your network to two ISPs, you can make sure everybody still knows how to reach you if one of those ISPs has an outage.

Want to know more? Read A Look at Multihoming and BGP, an article about multihoming I wrote for the O'Reilly Network.

For those of you interested in multihoming in IPv6 (which is pretty much impossible at the moment), have a look at the "IPv6 multihoming solutions" page.

Are you a BGP expert? Take the test to find out!

These questions are somewhat Cisco-centric. We now also have another set of questions and answers for self-study purposes.

You are visiting bgpexpert.com over IPv4. Your address is 107.20.104.161.