Full view or not full view: the benefits and dangers of the full BGP table

facebooktwittergoogle_plusredditpinterestlinkedinmail

bgp table
One can easily find dozens of threads on the choice of equipment for the BGP peering with an option to keep two, three, five, twenty-five full views on any networking-related forum. Most of these threads turn into heated discussions on the topics like Cisco vs. Juniper, or something even worse. Their offline evolution is ridiculous in general.

However, the question of the full view’s necessity arises quite rarely.

A little bit of “theory”

Full view is virtually the “Yellow Pages” directory for the entire Internet. Thus, it must be perceived as “Yellow Pages” and not as a personal address book. The key difference lies in the fact that apart from letters to numbers conversion, which is accomplished by DNS, not full view, temporary directories alert us about objects’ appearance and disappearance in the internet. After all, we need a way of knowing whether certain address is present in the internet. Moreover, it would be useful to be aware if the address suddenly becomes available only via link A, and not available via link B. Essentially, this is the reachability signal. Without getting into too much abstraction, it is worth stressing the importance of realization that reachability and routing are completely different things. Full BGP, Full BGP Table or Full Feed terms are often used instead of the Full View.

Routing – identification of the best way for sending traffic to a given destination. This process can hardly be compared to a search in the phone book; it has more similarities with a city map: if the same x.x.x.x address is available through both the link A and the link B, it is necessary to decide where to send the packets.

Assuming that the reader is familiar with the IP protocol, knows what is a prefix and its length, and has a notion of what the longest match rule is:

As it can be seen from the famous graph taken from bgp.potaroo.net and shown above, full Internet routing table (hereinafter, IPv4 will be the key point of discussion, although almost everything except numbers is also true for IPv6) now contains nearly 600,000 records. Here is the updated data on the current size of the table: bgp.potaroo.net/index-bgp.html. This number grows exponentially and quite fast. Each of these entries is actually a route: the destination IP prefix (subnet and mask), next-hop (next node aka “where to send”) and various other parameters that determine the value of this route. When there exist two routes for the same prefix (received from different neighboring routers), these attributes play a significant role. They determine what next-hop will be used for sending packets.

Example:

rviews@route-server.as8218.eu> show route 8.8.8.8    
inet.0: 343453 destinations, 1643368 routes (343453 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

/*  The active patch (i.e. the best BGP path) is marked with a star, 
    and this path is only used for traffic routing */
 >8.8.8.0/24         *[BGP/170] 6w2d 06:12:47, MED 200, localpref 3200, from 83.167.56.18
                      AS path: 15169 I
                    > to 83.167.56.240 via ge-0/0/0.0
                     [BGP/170] 4w1d 04:00:53, MED 200, localpref 3200, from 83.167.56.6
                      AS path: 15169 I
                    > to 83.167.56.240 via ge-0/0/0.0
                     [BGP/170] 4w2d 04:00:03, MED 200, localpref 3200, from 83.167.56.5
                      AS path: 15169 I
                    > to 83.167.56.240 via ge-0/0/0.0
[...]

Shown here are three routes to the 8.8.8.0/24 prefix, received from three different neighboring BGP routers. For some reason, non-trivial in this case, the first was selected as the best one (the reason is not shown in the example). One must not be confused by the fact that all three routes have the same next-hop: the data is gathered from a very specific router, which is not transmitting transit traffic. Apart from that, it is worth to note that the neighboring router and next-hop are not the same thing.

The routes between routers are advertised via BGP protocol, which is, practically an RSS-translation.

In other words, the protocol itself is not complicated – it is just a way of automated data exchange, wrapped in the best programming practices like in some sort of containers, which in compliance with protocol standard are more or less scalable if the necessity of their expansion arises. Its search mechanism for the best path (routing) and the connection control (reachability) is quite simple.

In particular, BGP considers that the traffic is transmitted between aggregated entities: autonomous systems (AS) instead of routers and knows almost nothing about their internal structure. BGP considers the way through two transit autonomous systems with, for example, ten internal hops (routers) each, more preferable than the path through five autonomous systems, with two internal hops each. Moreover, BGP hardly knows anything about the bandwidth of the links, which is why it is fair to say that this aspect is not taken into account at all when choosing the route.

In comparison to other protocols, it is not too fast in detecting connectivity changes and not always looking for the best optimal path. However, all these are not simply drawbacks; they are also advantages of BGP, because due to its relative straightforwardness, the protocol is well suited for the transmission of a large volume of routing information.

Therefore, the table of 600 thousand records with a bunch of attributes – is quite a lot. In addition, there usually has to be at least two of such tables. Simple storing of all this stuff requires many hundreds of megabytes of memory, but apart from the storing and exchanging them, the tables need to be processed in order to obtain a set of the best (active) routes.

For example, a neighboring router announced that it knows the route to, say, Google, and another neighbor announced it as well. Now we know that Google is accessible through each of the neighbors (reachability) and we need to decide which of the two routes to use for packet transmission (routing).

To do this, BGP compares indicators (such attributes as: Local Preference, AS-PATH, etc.) and makes a decision (on the basis of some criteria that does not matter for now) that, for example, the first neighbor is preferable. And so on for each prefix. Thus, several tables (each consisting of about 600 thousand entries) received from different neighbors, are “compiled” one table of about 600 thousand active routes:

route-server>show ip bgp summary 
BGP router identifier 12.0.1.28, local AS number 65000
BGP table version is 22316128, main routing table version 22316128

339895 network entries using 41127295 bytes of memory   // Active prefixes: 340 K
6244036 path entries using 324689872 bytes of memory     // Active prefixes: 6,2 MM

420973/58130 BGP path/bestpath attribute entries using 58936220 bytes of memory
82551 BGP AS-PATH entries using 2164884 bytes of memory
150 BGP community entries using 3600 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 426921871 total bytes of memory
Dampening enabled. 1728 history paths, 1519 dampened paths
BGP activity 3285644/2945748 prefixes, 85118539/78874498 paths, scan interval 60 secs
[…]
Or the same, plus IPv6 on Juniper:
rviews@route-server.as8218.eu> show route summary    
Autonomous system number: 8218
Router ID: 83.167.63.120
 inet.0: 343437 destinations, 1643129 routes (343437 active, 0 holddown, 0 hidden)
              Direct:      3 routes,      3 active
               Local:      1 routes,      1 active
                  BGP: 1642930 routes, 343238 active
              Static:      2 routes,      2 active
               IS-IS:    193 routes,    193 active
[…]
 inet6.0: 4796 destinations, 20733 routes (4796 active, 0 holddown, 0 hidden)
              Direct:      4 routes,      4 active
               Local:      2 routes,      2 active
                  BGP:  20577 routes,   4640 active
              Static:      1 routes,      1 active
               IS-IS:    149 routes,    149 active

The construction of a few more uncalculated BGP feeds (not only them, but also data of other protocols, to be more precise) is called RIB (routing information base). It is kept in an ordinary RAM and is processed by an ordinary CPU. Correspondingly, these two elements must meet certain requirements when it comes to the amount of full BGP tables that can be crammed into the RIB. The total number of entries here is defined as the sum of all received routes from neighbors: two full views – almost 1.2 million prefixes, three – about 1.9 million and so on.
 

The first conclusion. Loading full view makes no sense if you only have one session with the outside world. Possession of this mass of information will not provide anything but the additional strain on the equipment, because the traffic can only be send through one way.

 
One does not need an unreasonable amount of memory for a RIB with several full views, however hundreds of megabytes – few gigabytes are still required (depending on the number of full views, implementation, and many other aspects). Processor also often comes under a strain, especially in implementations with BGP-scanner. For instance, up/down session, which the complete table is transmitted through, may lead to a hundred percent CPU usage during a half hour period on certain platforms.

However, it is clear that gigabytes of memory and gigahertz of processor frequency has long ceased to be something special. These numbers do not look that bad even in the context of network equipment, where manufacturers are known for the ability to sell an ordinary DRAM like the one in a computer at a price, equal to the cost of a spacecraft, pretending 2GB to be the pinnacle of progress. Forum discussions’ participants, mentioned at the beginning of the article, often come to this conclusion. They are confident that a great amount of memory is the key. This statement is generally true, but unfortunately, it does not solve the case completely.

Let us see what happens with the full view further on. But before that, one more small but very important remark.
 

Routing information is always transmitted towards the traffic that is switched on its basis. In other words, the outbound traffic from your AS is sent on the basis of full view. Despite of the obviousness of this thesis, its practical value is not always fully realized.

 
 

Forwarding

Further, this “compiled” table of the active routes is used to forward traffic. It is called FIB (forwarding information base), and the number of entries in it, roughly equal to the number of entries in one full view (almost 600 thousand).
 

Lyrical digression. Generally speaking, Full View (unlike Full BGP Feed) – it’s just the FIB. In most cases, it is better to say, “I need a router that can hold three peers with full view” instead of “I need a router that can hold three full view.”
And, by the way, the signature on the Y-axis on the great and terrible picture shown at the beginning of topic is wrong. This is not a RIB, but a FIB. The headline on inside page actually hints on that too.

 
Most modern routers capable of transmitting traffic at a speed of the couple of gigabit per second and higher are “hardware speed”. Their “hardware speed” lies in the fact that the FIB and its attributes are placed not in the ordinary memory but in a special, roughly speaking, a “fast” switching memory (SRAM, TCAM, RLDRAM, etc.), which is referenced by the special packet processors. This memory is probably the most expensive resource on the router. Moreover, the skill, needed to work with it, certainly is the most important of the factors that affect the price of the hardware.

For example, a switch with 24 Gigabit ports, capable of transmitting traffic with unbelievable force (simultaneously at full speed on all the interfaces), nowadays costs few thousand dollars or even less. It also has a “hardware speed” approach and likely the CPU power and RAM is enough to easily thresh four full views in RIB. Moreover, its software often supports a large variety of complex features.

However, “a full-fledged router” capable of doing, it would seem the same, is worth fifteen times more. That is because in addition to all sorts of marketing subtleties, the switch can hold in its switching table up to 10-15 thousand routes, and a list of actions that can be performed with its table is much longer. For example, if a record for each packet must be looked up in FIB not once, but two or three times (it is needed more often than you might think) – $ 2 000 switches cannot handle this.

There are also software boxes (with up to one-two gigabits productivity, roughly speaking) that have the FIB, as well as the RIB, stored in a conventional RAM. They often have too much stuff stored in that RAM, but this is a topic for another discussion, essentially – RAM has its limits. Moreover, the speed of software search through the array for finding the best match (longest match) decreases with the addition of new entries in the array, no matter what algorithm is used.
 

The second conclusion. When choosing a specialized router for BGP peering with full view, ask the seller about the maximum size of the FIB. If the seller does not know – it is worth to think about his competence. Possible options:

  • <500 thousand for IPv4 – it’s not worth to take this nowadays. It will not be enough in the nearest future. Think about IPv6.
  • ~ 500 thousand – This figure is popular for so-called large L3 switches, that have a lot of unnecessary features, and a rather mediocre list of switching procedures. Although there are nice exceptions. “Big” switches are distinguished from “small” ones mainly by the size of the box and model superiority in the line, and more importantly by switching memory capacity: a “small” 24-port switch rarely supports half a million entries in the FIB. Thus, although the switching memory of the “big” L3 switches seems to be enough for today’s full view, they are almost never capable of handling complicated operations with the packet (in fact, this is their main difference from the routers). On the one hand, they can be used for this task, but on the other hand, it is better not to use them. There are all sorts of particularities. To sum up, think and question the seller carefully before buying a L3 switch for BGP peering with full view.
  • A million or more – a decent figure.

If the seller can confidently consult you on the RIB size as well, (how many peers with full view can be kept on the machine you chose) it is a sure sign of the seller, who knows what he is offering. If he is also able to support a conversation about the difference between big L3 switch and high-grade routers and is ready to tell you about the possible consequences of using a big L3 switch for BGP peering – you are on the right track, it is time to discuss the price.

Other arguments for and against BGP full view read in the article Full view or not full view – Part 2.

facebooktwittergoogle_plusredditpinterestlinkedinmail

Leave a Reply

Your email address will not be published. Required fields are marked

Your Name

Your email