My new BGP book: 'Internet Routing with BGP' by Iljitsch van Beijnum BGPexpert My BGP book from 2002: 'BGP' by Iljitsch van Beijnum

Home · BGP Expert Test · What is BGP? · BGP Vendors · Links · Archives · Books · My New BGP Book

BGP (advertisement)
  • DNS and routing of IPv6 micro allocations (posted 2003-12-09)

    Currently, there are very few people who want to run an IPv6-only network. And that's a good thing too, as presently, there is no way to do this. One of the big hurdles is the DNS. Right now, very few, if any, top level domains accept IPv6 glue records. However, there are no technical reasons why those can't be added. Unfortunately, there is a technical reason why making the existing root nameservers perform their function over IPv6 is problematic. When a nameserver starts up, it looks at a local file for root servers. However, it will only use this list of root servers for a single query: one that results in the list of current root servers. In order to avoid problems, it's important that the answer for this query contains all the addresses for the root servers as additional information. The problem is that the original DNS specifications allow a relatively short packet size (around 512 bytes). This allows for the current 13 root servers and their IPv4 addresses with little room to spare.

    But in the mean time some root server operators are experimenting with making the root service available over IPv6. (See for more information.) At the time of this writing, four root servers have IPv6 addresses:

    • B: 2001:478:65::53
    • F: 2001:500::1035
    • H: 2001:500:1::803f:235, and
    • M: 2001:dc3::35

    However, only B and M are reachable (for me). A closer look at the addresses used provides the following information:

    • B: 2001:478:65::53 -> 2001:478::/32 @ ARIN: EP.NET
    • F: 2001:500::1035 -> 2001:500::/48 @ ARIN -> Internet Software Consortium
    • H: 2001:500:1::803f:235 -> 2001:500:1::/48 @ ARIN: U.S. Army Research Laboratory
    • M: 2001:dc3::35 -> 2001:dc3::/32 @ APNIC: M-ROOT-DNS-IPv6-20030619

    The plot thickens... Since everyone and their little sister can easily obtain a /48 worth of IPv6 address space (I have two of those for personal use), it's expected that the global IPv6 routing table will suffer a lot of pollution from /48s, much like what happens with /24s in the IPv4 routing table, only worse. So it's unavoidable to filter on prefix length and not accept /48s.

    (Additionally, it looks like the H /48 isn't announced at all: the route doesn't show up on the AMS-IX IPv6 looking glass, which does show the F /48 and other more specifics.)

    When this issue came up on the IETF mailinglist, Paul Vixie, operator of the F root server, indicated that he had simply followed ARIN guidelines and obtained a /48 "micro allocation" from ARIN. It turns out ARIN has set aside a some address space for internet exchanges and "critical infrastructure". This address space is given out as /48s, see List of IPv6 Micro-allocations. (RIPE has a somewhat similar page at Smallest RIPE NCC Allocation / Assignment Sizes but it doesn't mention micro allocations.) All of this seems perfectly reasonable, except for one thing:

    the existence of micro allocations is never mentioned in the RIR's IPv6 policy document.

    This document, which is available in slightly different layouts and versions from LACNIC, APNIC, RIPE, ARIN and, for good measure, from IANA, says:

    "4.3. Minimum Allocation

    RIRs will apply a minimum size for IPv6 allocations, to facilitate prefix-based filtering.

    The minimum allocation size for IPv6 address space is /32."

    And this is exactly what many ISPs that offer IPv6 service do: they filter on a prefix length of 32 bits as indicated above, or 35 bits, the old allocation size. Obviously someone dropped the ball big time here, and this needs to be fixed in one way or another. Watch this space for more information. In the mean time, be sure to selectively relax your filters if you do prefix based filtering in IPv6. Gert Döring maintains a set of IPv6 BGP filter recommendations.

  • NetworkWorldFusion on fortifying BGP and IPv6 being cheap and easy (posted 2003-12-15)

    I ran into two interesting articles at NetworkWorldFusion:

    A while ago, they ran an article about BGP and BGP security under the title Fortifying BGP: No quick fix. The article doesn't go into much depth, but it has some interesting quotes. Fred Baker from Cisco says S-BGP is dead in the water, while S-BGP proponent Steve Kent at BBN speaks harsh words about Cisco's soBGP, indicating there are options in soBGP that are disastrous from a security standpoint and architectural problems. (The exact nature of the problems remains unclear, though.)

    BGP Security at

    The other article, IPv6 fears seen unfounded, reports that early adopters find the transition to IPv6 both cheaper and easier to do than expected. Since the protocol has been in development for so long (8 to 10 years, depending on the IPv6 epoch of choice), most hardware and software vendors have now implemented the protocol so it's available (at no additional cost) now that people start adopting IPv6. Turning on the protocol turned out to be a fairly simple affair as well for the people quoted in the article.

  • no ip unreachables (posted 2003-12-30)

    In an article earlier this year I talked about problems with path MTU discovery (PMTUD). (You may want to look at the links.) Quick recap:

    • There are network segments with widely differing maximum packet sizes connected to the internet
    • If a router receives a packet that is too big to forward over the next link, the router must fragment the packet
    • Fragmentation is hard work and should be avoided where possible
    • RFC 1191 says: let hosts discover the MTU for the entire path and send packets that are small enough. This is done by setting the "don't fragment" bit in the IP header and listening for ICMP messages saying the packet was too big
    • Although nearly all links use the ethernet 1500 byte MTU or a larger one, tunneling, which decreases the MTU, has become more and more common
    • The designers and implementers chose to set the DF bit in ALL packets
    • Result: when using an MTU smaller than 1500 bytes (which is often hard to avoid when using some form of tunneling), unrecoverable reachability problems ensue

    One thing that many people may not realize is that the Cisco "no ip unreachables" directive turns off generation of ICMP "packet too big" or "fragmentation needed and DF bit set" messages for a router interface. ICMP messages have a type and a code. (IANA has the full list.) Type 3 covers different kinds of destination unreachable errors, including "packet too big" which is code 4 under type 3. Unfortunately, the Cisco documentation isn't particularly helpful. Under Configuring IP Services:

    Enabling ICMP Protocol Unreachable Messages

    If the Cisco IOS software receives a nonbroadcast packet destined for itself that uses an unknown protocol, it sends an ICMP protocol unreachable message back to the source. Similarly, if the software receives a packet that it is unable to deliver to the ultimate destination because it knows of no route to the destination address, it sends an ICMP host unreachable message to the source. This feature is enabled by default.

    The fact that the ip unreachables command also affects sending of packet too big messages isn't even hinted at... The same goes for the description of the command in the command reference section. Only an ICMP Services Example explains a little more:

    Disabling the unreachables messages will have a secondary effect—it also will disable IP Path MTU Discovery, because path discovery works by having the Cisco IOS software send Unreachables messages.

    However, there still is no warning that disabling unreachables will make anything connected to links with reduced MTUs virtually unreachable, as nearly all hosts send all their packets with the DF bit set. And there are many people recommending "no ip unreachables": a Google search reveals this combination of words shows up, ironically, a little more than 1500 times.

    There is a good use for this command, however: when a range of addresses is routed to the null interface, and "no ip unreachables" in configured for the null interface, any packets to the destinations in question will be dropped at the CEF level. Note that "no ip unreachables has affect on packets routed to the null interface, which is different from the behavior on other interfaces, where the command determines whether unreachables are sent back in response to packets received on the interface.

    Most link types have a fairly fixed MTU (such as ethernet with its 1500 bytes) or support negotiation of the MTU (such as PPP). However, some link types make it very easy for both ends to set different MTUs. This regularly happens with Cisco's HDLC on serial links. In this case, the packets can't be received successfully at the end using the smaller MTU. Fortunately, debugging is easy: if the MTU is 1500, set it much higher, if it's larger than 1500, set it to 1500. In most cases this will clear up the problem. Or just switch to PPP... The same problem can happen on tunnels, but from what I've seen many systems just accept the too-large packets. This leads to strange path MTU discovery behavior as the link then has different MTUs in both directions, but this shouldn't be much of a problem.

    I think the moral of this story is that it's probably not worth the trouble to run path MTU discovery on systems that have a 1500 byte MTU. Since all systems behind links that don't support 1500 bytes need to implement ugly hacks such as clearing the DF bit or rewriting the TCP MSS option anyway, the resulting increase in fragmentation on the network will be negligible and it should save significantly on debugging.

    Can't get enough of fragments? Have a look at a CAIDA analysis.