oreilly.comSafari Books Online.Conferences.


Linux Network Administration

Building High Performance Linux Routers


I recently attended an interesting and thought-provoking short course on IP router architecture led by Gísli Hjálmtýsson. Gísli is engaged in research in the field of active networks and has developed a Linux-based prototype active router called "Pronto." In describing this and other of his work, Gísli offered some insight into the issues impacting router performance, especially in a Linux environment. In this article I thought I'd take a couple of Gísli's key observations and translate them into some practical guidelines to assist in the construction of Linux-based routers with a focus on performance rather than functionality.

The fast path

The basic process of routing the IP protocol is deliberately simple. For high performance routing, you want the datagram to be passed as quickly as possible from one interface to another. The process that does this forwarding for the vast majority of datagrams is sometimes called the "fast path."

  • The hardware receives the data from the transmission medium, stores it in a buffer, and signals the device driver that it is available to be read. The signalling is invariably performed using hardware interrupts. In the case of Ethernet hardware, there is often a single interrupt generated for each received packet; in the case of a PPP link with serial hardware, there can be as many as one interrupt per received character. This is important, as we'll see later.

  • The hardware device driver is responsible for reading the data from the hardware. Often the hardware or its device driver will do some checks to ensure that the data is not corrupt. Ethernet cards, for example, implement a checksum in hardware, discarding any packets that have been corrupted in transmission. The SLIP device driver, on the other hand, has no means of knowing, as the SLIP protocol does not provide any error detection capability, relying on that provided by the IP protocol.

  • The device driver will then call the netif_rx() function in the core Linux networking code, which will check the protocol identifier of the received data and forward it to the appropriate kernel protocol stack. This article focuses only on version 4 of the IP protocol, but much the same process applies to other supported protocols such as IPv6, IPX, AX.25, and DECNet.

  • The IP protocol stack will first do some rudimentary checks to ensure that the datagram is ok.

    The most basic tests that are performed are sanity checks, including ensuring that the length of the IP datagram is at least as long as an IP header and satisfies the IP header length field, and that the version number in the IP header is version 4. Finally, the IP header is checked for corruption by testing the IP header checksum against the IP header data. It's worth noting here that the IP header checksum protects only the header of the IP datagram, not the payload ... this allows this check to be performed very quickly. If any of these tests fail, the datagram is simply thrown away.

    If the IP header contains any options fields, these are processed next. Some options that might be used are the "router alert" option and the "source route" option.

  • The destination address of the IP header tells the router where the datagram is to go. The first test that must be performed is one that determines whether the datagram is destined for us (i.e., this host), or whether it is destined for some other host and needs to be forwarded. It is simple enough to determine if the datagram is for us; the destination address will match an address of one of our active network interfaces.

    If the datagram is for us, it is processed, and the data is ultimately passed to a local socket for an application to use. If the datagram is not for us, it is passed to the IP forwarding engine.

  • If the datagram is to be forwarded to another host, our router must do two things. Firstly it must decrement the IP time-to-live field in the datagram and discard the datagram if the result is zero. This mechanism helps limit the damage caused by routing loops. Naturally, we need to recalculate the IP header checksum for any datagram we keep, because the header has been modified.

    Secondly, and much less trivially, our router must determine where to transmit the datagram next. It does this using the IP routing table. The IP routing table is usually built automatically using a routing daemon like Zebra, supporting routing protocols like OSPF, BGP, or RIP. Sometimes the routing table will have routes that have been entered manually called static routes. You can display the routing table using the commands route or ip route l.

    Routes are found by searching the routing table, looking for the best match. A match occurs when the destination field of the route matches the destination address of the IP datagram to be routed with the number of bits described by the network mask of the route. The network mask is the genmask field of the route command output, or the /nn field of the ip command output. A match can be anything from no bits (the default route) to the full 32 bits (a host route). The best match is the one that has the greatest number of matching bits. This search requires a certain amount of CPU power to perform and many, many bit test operations. Various tricks are used to reduce the time and effort taken to perform the search such as caching and hash tales. Suffice it to say for this column that the task of identifying the best matching route is computationally intensive with lots of bit tests and comparisons, especially as the number of routes increases.

  • Finally, the IP datagram can be sent to the device driver of the hardware that will carry it to its next hop. The data will be placed in a buffer where the hardware may read it for transmission, but not before one last operation occurs.

    Each network interface on an IP router has a value associated with it called the MTU, the maximum transmission unit, which represents the largest sized chunk of data that the interface can transmit in a single transmission. For Ethernet interfaces this value is 1500 bytes, but some network technologies support larger or smaller MTU. If the datagram we're forwarding to an interface is larger in size than the MTU of that interface, we are obligated to cut it up into pieces that are at most MTU-sized. This process is called fragmentation. If, for, example our interface is a high-speed serial interface supporting PPP with an MTU of 576 bytes and we have a 1500 byte datagram to send to it, we break it up into two datagrams of 576 bytes, and the remaining data we place in a third smaller datagram. We then send all three of these datagrams to the interface for transmission.

This process occurs for each and every datagram forwarded by an IP router. The time taken for this process to complete is critical to the overall performance of the router. This process is called the fast path because there are slower processes possible. In practice, in modern routers there are a number of other tests that may be performed within the fast path. Features like firewall and network address translation each have tests associated with them.

Pages: 1, 2

Next Pagearrow

Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!

Linux Resources
  • Linux Online
  • The Linux FAQ
  • Linux Kernel Archives
  • Kernel Traffic

  • Sponsored by: