Hacker News new | past | comments | ask | show | jobs | submit login
BGP handling bug causes widespread internet routing instability (benjojo.co.uk)
321 points by robin_reala 1 day ago | hide | past | favorite | 153 comments





The standard approach is be liberal in what you accept and be specific in what you emit.

You could

1) Filter the broken message

2) Drop the broken message

3) Ignore the broken attributes but pass them on

4) Break with the broken attributes

To me, only 4 (Arista) is the really unacceptable behaviour. 3 (Juniper) isn't desirable but it's not a devastating behaviour.

EDIT: Actually rereading it, Arista did 2 rather than 4. I think it just closed the connection as being invalid rather than completely crash. That's arguably acceptable, but not great for the users.


There is already RFC 7606 (Revised Error Handling for BGP UPDATE Messages), which specifies in detail how broken BGP messages should be handled.

The most common approach is 'treat-as-withdraw', i.e. handle the update (announcement of a route) as if it was a withdraw (removal of previously announced route). You should not just drop the broken message as that whould lead to keeping old, no longer valid state.


> The standard approach is be liberal in what you accept and be specific in what you emit.

What you're paraphrasing here is the so-called "robustness principle", also known as "Poestel's law". It is an idea from the ancient history of the 1980s and 09s Internet. Today, it's widely understood that it is a misguided idea that has led to protocol ossification and countless security issues.


Postel's Law certainly has led to a lot of problems, but is it really responsible for protocol ossification? Isn't the problem the opposite, e.g. that middleboxes are too strict in what they accept (say only the HTTP application protocol or only the TCP and UDP transport protocols)?

Overly strict and overly liberal both lead to ossification. That's merely the observation that buggy behavior in either direction can potentially come to be relied on (or to be unpredictably forced on you, in the case of middleboxes filtering your traffic).

I'd only expect security issues to result from being overly liberal but 1. I wouldn't expect it to be very common and 2. I'm not at all convinced that's a compelling argument to reduce the robustness of an implementation.


"Overly strict" only leads to ossification if the designers of the system forget to build extensibility into their system design from the beginning.

what does ossification mean?

open source something?


Ossification comes from os, ossis: bones in Latin. Turning into bones. Stops being flexible. Common behavior becomes de facto specification. There's stuff that's allowed by the specification but not expected by implementations because things have always worked like this.

It's not related to open source software. The seemingly matching prefix is coincidence :-)

https://en.m.wiktionary.org/wiki/ossification


Pretty much when something in the spec in theory could change, but in practice never does. So software and hardware gets built around the assumption that it never changes.

For example for networking you can have packets sent using TCP or UDP, but actually there could be any number of protocols used. But for decades it was literally only ever those two. Then when QUIC came about, they couldn't implement it at the layer it was meant to be because all the routers and software were not built to accept anything other than TCP or UDP.

There's been a bunch of thought in to how to stop this stuff like making sure anything that can change, regularly does. Or using encryption to hide everything from routers and software that might want to inspect and tamper with it.


It's when existing implementations' inflexibility prevent protocol evolution.

https://en.wikipedia.org/wiki/Protocol_ossification


literally it means that something is slowly turning into stone, like dinosaur bones. protocols and standard libraries suffer from this in figurative sense.

Sclerotization.


The trouble is it fails to specify what you're supposed to be liberal with.

Suppose you get a message that violates the standard. It has a length field for a subsection that would extend beyond the length of the entire message. Should you accept this message? No, burn it with fire. It explicitly violates the standard and is presumably malicious or a result of data corruption.

Now suppose you get a you don't fully understand. It's a DNS request for a SRV record but your DNS cache was written before SRV records existed. Should you accept this message? Yes. The protocol specifies how to handle arbitrary record types. The length field is standard regardless of the record type and you treat the record contents as opaque binary data. You can forward it upstream and even cache the result that comes back without any knowledge of the record format. If you reject this request because the record type is unknown, you're the baddies.


I would say the proper way to apply Postel's law is to reasonable interpretations of standards. Internet standards are just text documents written by humans and often they are underspecified or have multiple plausible interpretations. There is no IETF court, which would gives canonical interpretation (well, appropriate working group could make a revision of the standard but that is usually multi-year effort). So unless we want to break up to multiple non-interoperable implementations, each strictly adhering to their own interpretation, we should be liberal about accepting plausible interpretations.

That's not really the issue though.

There are many cases where the RFC is not at all ambiguous about what you're supposed to do, and then some implementation doesn't do it. What should you do in response to this?

If you accept their garbage bytes, things might seem less broken in the short term, but then every implementation is stuck working around some fool's inability to follow directions forever, and the protocol now contains an artificial ambiguity because the bytes they put there now mean both what they're supposed to mean, and also what that implementation erroneously uses them to mean, and it might not always be detectable which case it is. Which breaks things later.

Whereas if you hard reject explicit violations of the standard then things break now and the people doing the breaking are subject to complaints and required to be the ones who stop doing that, rather than having their horkage silently and permanently lower the signal to noise ratio by another increment for everyone else.

One of the main problems here is that people want to be on the side of the debate that allows them to be lazy. If the standard requires you to send X and someone doesn't want to do the work to be able to send X then they say the other side should be liberal in what they accept. If the standard requires someone to receive X and they don't want to do the work to be able to process X then they say implementations should be strict in what they accept and tack on some security rationalization to justify not implementing something mandatory and thereby break the internet for people who aren't them.

But you're correct that there is no IETF court, which is why we need something in the way of an enforcement mechanism. And what that looks like is to willingly cause trouble for the people who violate standards, instead of the other side covering for their bad code.


> If you accept their garbage bytes, things might seem less broken in the short term, but then every implementation is stuck working around some fool's inability to follow directions forever, and the protocol now contains an artificial ambiguity because the bytes they put there now mean both what they're supposed to mean, and also what that implementation erroneously uses them to mean, and it might not always be detectable which case it is. Which breaks things later.

And, if your project is on GitHub, gets your Issues page absolutely clowned on because you're choosing to do the right thing technically and the leeching whiners shitting up the Issues don't want to contribute a goddamn thing other than complaints, and they definitely don't want to go to the authors of the thing that doesn't work with your stuff and try and get that fixed either.


That is the only context in which Postel's Law actually works, but it is obviously not the world of the internet.

It's a description of how natural language is used, so what you'd expect is constant innovation, with protocols naturally developing extensions that can only be understood within local communities, even though they aren't supposed to.

Something like "this page is best viewed in Internet Explorer" as applied to HTML.


> Today, it's widely understood that ...

Widely claimed by some but certainly not "widely understood" because such phrasing implies a lack of controversy regarding the claim that follows it.


It's kind of common sense, though. Look at HTML. So badly/under defined that it wasn't even testable for close to 2 decades.

The sane approach is to be strict and provide great error messages.


This "sane" approach lost to HTML.

It's called "backwards compatibility/legacy systems inertia" and plenty of bad but old techs will never die. It doesn't make them good.

A good HTML might not even look like HTML.


You're referring to XHTML 2?

Postel's law is absolutely great if you want to make new things and get them going in a hurry, and I think it was one of the major reasons the TCP/IP stack beat the ISO model. But as you say, it's a disaster if you want to build large robust systems for the long term.

1970s was also just a different time: documentation was harder to get, it was harder to do quality implementations for protocols, people had less of an idea what may or may not work because everyone was new at this (both in terms of protocols and implementations), shipping bugfixes took a lot longer, few people were writing tests (and there wasn't a standard test suite), few people had long experience with these protocols, and general quality of software was a lot lower.

The problem is that folks took advantage of the behavior of BGP where it would forward unknown attributes that the local device didn't understand, as a means to do all sorts of things throughout the network. People now rely on that behavior.

Now, we're experiencing the downside of this "feature"


BGP has classes attributes that it forwards. While it is true that it forwards route attributes it doesn't know about, this was an attribute that it DID know about and knows it shouldn't forward.

In fact it's a bit strange just how lenient Juniper's software was here. If a session is configured as IBGP on one end and EBGP on the other end, it should never get past the initial message. Juniper not only let it get past the connection establishment but forwarded obviously wrong routes.


Yes but you are seeing a symptom of what I believe is a fundamental design decision to be liberal in passing on data and then _later_ go through and build logic that stops certain things from being forwarded, and the result is that things slip through the cracks that shouldn't.

Rather than the inverse where you only forward things explicitly and by default do not forward.


As far as I'm aware "a session is configured as IBGP on one end and EBGP on the other end" isn't possible.

You can't configure it like that, most of the BGP implementations I'm familiar with automatically treat the a same-AS neighbor as iBGP and a different-AS neighbor as eBGP.

Juniper explicitly has 'internal' and 'external' neighbors, but you can't configure a different peer AS than your own on an internal neighbor or the same peer AS on an external neighbor.

BGP sessions also have the AS of the neighbor specified in the local config, and will not bring up the session if it's not what's configured.


Without this behavior it would be impossible to deploy newer BGP attributes globally.

I understand that, but it's a double edged sword. We enjoyed that flexibility for a long time, but lately we are now experiencing the downsides of this flexibility.

Author makes this point in a related post:

At a glance this “feature” seems like an incredibly bad idea, as it allows possibly unknown information to propagate blindly through systems that do not understand the impact of what they are forwarding. However this feature has also allowed widespread deployment of things like Large Communities to happen faster, and has arguably made deployment of new BGP features possible at all.


I disagree with this approach. Being very very specific in what you accept and very very specific in what you emit seems better to me.

Being that prescriptive is fundamentally unworkable in practice. Propagating unknown attributes is fundamentally what made the deployment of 32-bit AS numbers possible (originally RFC 4893; unaware routers pass the `AS4_PATH` attribute without needing to comprehend it), large communities (RFC 8092), the Only To Customer attribute (RFC 9234) and others.

A BGP Update message is mostly just a container of Type-Length-Value attributes. As long as the TLV structure is intact, you should be able to just pass on those TLVs without problems to any peers that the route is destined for.

The problem fundamentally is three things:

1. The original BGP RFC suggests tearing down the connection upon receiving an erroneous message. This is a terrible idea, especially for transitive attributes: you'll just reconnect and your peer will resend you the same message, flapping over and over, and the attribute is likely to not even be your peer's fault. The modern recommendation is Treat As Withdraw, i.e. remove any matching routes from the same peer from your routing table.

2. A lack of fuzz testing and similar by BGP implementers (Arista in this case)

3. Even for vendors which have done such testing, a number of have decided (IMO stupidly) to require you to turn on these robustness features explicitly.


PNG solved this problem when BGP was still young: each section of an image document is marked as to whether understanding it is necessary to process the payload or not. So image transform and palette data is intrinsic, but metadata is not. Adding EXIF for instance is thus made trivial. No browser needs to understand it so it can be added without breaking the distribution mechanism.

This is also how BGP (mostly) solved it. Each attribute has 'transitive' bit. Unknown attributes with 'transitive' bit set are passed, one without are discarded.

... Except for acTL, which is a special exception because it turns out that wasn't sufficient to ensure consistency in 100% of cases.

I was never that enthusiastic about motion PNGs in the first place. We have so many other ways to achieve that now.

You're suggesting that being liberal in what you accept is necessary for forward evolution of the protocol, but I think you're presenting a false dichotomy.

In practice there are many ways to allow a protocol to evolve, and being liberal in what you accept is just about the worst way to achieve that. The most obvious alternative is to version the protocol, and have each node support multiple versions.

Old nodes will simply not receive messages for a version of the protocol they do not speak. The subset of nodes supporting a new version can translate messages into older versions of the protocol where it makes sense, and they can do this because they speak the new protocol, so can make an intelligent decision. This allows the network to function as a single entity even when only a subset is able to communicate on the newer protocol.

With strict versioning and compliance to specification, reference validators can be built and fitted as barriers between subnetworks so that problems in one are less likely to spread to others. It becomes trivial for anyone to quickly detect problems in the network.


That's in conflict with the philosophy behind the internet. If you'd just drop anything because some part of it you don't understand, you lose a lot of flexibility. You have to keep in mind that some parts of the internet are running on 20 year old hardware, but some other parts might work so much better if some protocol is modified a little. Just like with web browsers, if everything is a little bit flexible in what they accept, you both improve the smoothness of the experience and create room for growth and innovation.

Postel's Law is important, but it creates brittle systems. You can force them further from the ideal operating state before failure, but when they fail they tend to fail suddenly and catastrophically. I like to call it the "Hardness Principle" as opposed to the "Robustness Principle" in analogy to metallurgy.

Surely the opposite? If everything was very pedantic and strict, the 'net would be so brittle as to be non-functional.

You're imagining a world where things get specified and implemented completely correctly. Which does not exist and probably can't!


That's what Postel thought. He was wrong. Allowing everything creates a brittle system because the system has to accept all the undocumented behaviour that other broken systems emit. If broken files were rejected quickly, nobody would generate them.

There's a difference between unknown extensions following a known format, and data that's simply broken (e.g. offset pointer past end of data).


There is a place for both. The accept everything model made some extensions better, but it also allowed for various malware when junk was accepted.

Postel's law doesn't mean "accept everything", but that you should accept de-facto rules people have created. If everyone says, "this is how we do it", you should ignore the RFC and just copy what others do.

There are several problems with that.

One, if everyone is doing something different from the spec it is hard to figure out what they are really doing and what they mean. Long term you have confidence things will continue to work even when someone else writes their own version which otherwise might also deviate from the spec.

Two, it is easier to modify the spec as more features are dreamed up if you have confidence that the spec is boss meaning someone else didn't already use that field for something different (which you may not have heard about yet).

Three, if you agree to a spec you can audit it (think security), if nobody even knows what the spec is that is much harder.

Following the spec is harder in the early days. You have to put more effort into the spec because you can't discover a problem and just patch it in code. However the internet is far past those days. We need a spec that is the rule that everyone follows exactly.


This is so wrong, read up on https://datatracker.ietf.org/doc/html/rfc9413

The internet is ossified because middleboxes stick their noses where they shouldn't. If they just route IP packets, we could have had nice things like SCTP...


Alright. See you over on the XHTML Internet. Oh, wait.

Browsers are permissive not because it's technically superior but as a concession for the end user who still wants to be able to use a poorly built website, and they're competing with browsers who will bend over backwards to render that crappy website so that they look good and your browser looks bad.

It's not a concession you want to make unless you really have to.


In other words, because the early Web followed Postel's law, we're now stuck in this local maximum.

If the iPhone had come out just a little bit later I think xhtml-basic would have gotten more traction. It was pretty nice to implement.

XHTML still lives on in the epub spec. I kinda wish we had an "epub web".

HTML is a nightmare that had to be reverse engineered as in, rebuilt with proper engineering standards in mind, several times. HTML and CSS are both quite horrible.

All of this is understood and has been discussed to death, it's just that Arista didn't implement the agreed-best approach (RFC7606) correctly.

I would perhaps argue that juniper's behavior is the preferable one.

Remember the definition of this "drop the message I think is broken" not inherently "drop the broken message," it's entirely plausible that the message is fine but you have a bug which makes you THINK it's a broken message.

There is also a huge difference between considering it a broken message and a broken session, which is what Arista did.


Arista did 2, but it also dropped the whole connection which was probably bad.

IMHO, just drop the broken attributes in the message and log them, and pass on the valid data if there's any left. If not, pretend you did not receive an UPDATE message from that particular peer.

Monitoring will catch the offending originator and people can deal with this without having to deal with any network instability.


In case you want to calibrate your sense of armchair-ness: you have completely missed the point that discarding an individual attribute can quite badly change the meaning of a route, and since we're talking about the DFZ here, such breakage can spread around the planet to literally every DFZ participant. The only safe thing you can do is to drop the entire route. Maybe there was a point to this being discussed at quite some length by very knowledgeable people, before 7606 became RFC ;)

(I haven't downvoted your comment, but I can see why others would — you're making very simple and definite statements about very complicated problems, and you don't seem to be aware of the complications involved. Hence: your calibration is a bit off.)


Funny enough, I actually have a few routers with a DFZ, so I have an idea or two about how BGP works.

My point is that:

- if you drop a connection, especially one through which you announce the full routing table, it is going to create a lot of churn to your downstreams. Depending on the kind of routers they use, it can create some network instability for quite a while. And if you drop it again when you receive that malformed route, the instability continues

- removing only the malformed attribute maybe changes the way you treat traffic but you still route it. OK, you send it to maybe another interface, but no biggie

- if you’re using a DFZ setup, dropping that single route could blackhole traffic to that destination if you’re the only upstream to another router


> Funny enough, I actually have a few routers with a DFZ, so I have an idea or two about how BGP works.

And I'm TSC emeritus and >10 year maintainer on FRRouting, and active at IETF. Yet I hugely respect the other people there, all of whom have areas of expertise where they far outrank my own.


Nice! (no sarcasm)

I have very strong opinions about some subjects, one of them being BGP.

I believe sessions should not be tear down just because you receive malformed data. You should be able to remove just the corrupt data. Or treat as a withdraw message like one of the RFC recommends.

I for one one would like knobs to match on any attribute and value and remove/rewrite them at will. Imagine something akin to a very smart HTTP proxy.


Blackholes are acceptable when it comes to broken attributes. Blackholes spreading is not.

If you just drop malfromed attributes, there is no blackhole spreading.

If the attribute says "encapsulate this", dropping just the attribute will create a blackhole as you will attract traffic that should be encapsulated and packets following this route will be dropped it if not.

I guess you're referring to RFC9012.

Yes, but then again since you have logs of why it was dropped (like I suggested in my first post, to log everything dropped), you can easily troubleshoot the problem. A much better outcome than flapping a BGP session for no good reason and creating route churn and network instability.


Or just drop the announced route (not the session) with the attribute you can't work with

I still remember the mad scramble we had to fix CVE-2023-4481 across our entire network. This class of bugs is going to be an absolute nightmare to deal with, and because of the way BGP has been designed & implemented, it is going to take a _long_ time to fix these kinds of behaviors.

I was developping BGP feature in a telco vendor though it's decades ago.

Still think BGP is too complex and people keeping add new features and vendors keeping implement it based on RFC standard or draft.

And it seems BGP will never be deprecated so this sort of bugs will continue be found again and again...


There was certainly a period of time where folks like AT&T alongside Juniper and Cisco drove BGP into crazytown by way of MPLS and VPN related features. Terrifyingly complex (imo) but lucrative for some.

HGC Global Communications Limited, formerly known as Hutchison Global Communications Limited (abb. HGC), is an internet service provider of Hong Kong.

https://en.wikipedia.org/wiki/HGC_Global_Communications


It appears to this reader that BGP would be a lot more stable if the various hardware vendors agreed on a standard for handling these types of things.

Is the real issue that each vendor wants lock-in, so won't standardise?

DISCLAIMER: My understanding of BGP is hollow and shallow, I am not an expert.


Is it just me or BGP is something I never learnt about until I heard about it causing issues? It seems it's essential to the internet, just like TCP/IP, but nevertheless I learnt about the latter in the university, during my career, I read many books about TCP/IP... but nothing about BGP (not in the university, not at work, not in books, nothing).

I can "play" with TCP/IP at home in dummy projects and learn more about it... but I have no idea how to "play" with BGP. In that regard, how does one learn about it at home?


Buy some routers that have BGP implementations (there are some cheap ones, Mikrotik for example), or use open source implementations. The article lists bird, another very popular one is FRR (free range routing). You can trivially stand up two docker containers, stand up a BGP session between them, and - for example - propagate static routes you set up within them.

If you like guided tutorials, https://blog.ipspace.net/2023/08/bgp-labs-basic-setup/ is rather good and has been extended to somewhat advanced topics. Everything needed to follow along is free software.


A good tool to try this stuff is containerlab: https://containerlab.dev/

It lets you setup multiple containers with direct connections between them in whatever topology you want. It allows you to run both Linux containers (with FRR for example) and emulated versions of popular router platforms (some of the ones mentioned in the article).


OpenBSD include bgpd(8) out of the box:

  DESCRIPTION
     bgpd is a Border Gateway Protocol (BGP) daemon which manages the network
     routing tables.  Its main purpose is to exchange information concerning
     "network reachability" with other BGP systems.  bgpd uses the Border
     Gateway Protocol, Version 4, as described in RFC 4271.

At least to me one of the challenges is relating to the problems that BGP solves. You can get pretty far in network complexity before BGP (or OSPF etc) really does anything for you. What would be good scenarios one could encounter in "homelab" situation where BGP would be beneficial?

There are no scenarios where BGP contained within your home lab is beneficial for anything other than learning BGP. It's the routing protocol for the Internet. Its whole point is scaling globally, and - crucially - enabling making routing decisions that aren't just based on path weights. OSPF, IS-IS, EIGRP, whatever, they are all just path finding algorithms; OSPF is quite literally Dijkstra. That's great when you want to find the shortest or fastest path to somewhere, but that's not how the Internet works: it's quite reasonable for an operator to want to take the cheapest path (in terms of money), or to take the path that avoids specific foreign countries. BGP is expressive enough to write routing policy like that. You don't need that in your homelab, unless you want to learn BGP, either because you need to for work or to further your career, or because you're curious about it.

Running a few Kubernetes nodes with a network plane like Cilium that supports using BGP to inform your router about which container IP is on which node is a simple-ish one.

Not really, unless you have thousands of routes to manage across large numbers of gateways. Otherwise, running BGP inside your homelab is just a learning tool.

*crickets*

DN42[1] provides a playground for routing technologies. I wouldn't recommend digging in if you don't want to dedicate a lot of time into it. As someone fairly well versed in networking, WAN routing is still confusing to me.

GNS3 is probably the easiest way to get hands on experience with any networking technologies.

[1]: https://wiki.dn42.us/home


My undergraduate networking course didn't touch BGP, my graduate networking course did touch BGP. We used a python package that acted as a simulator for different AS but I can't remember which one.

My undergrad networks course discussed a little BGP stuff but only on the blackboard.

To experiment with BGP you could use a network simulator like what the author of this blog did. In my class we used something called gini[1] which I think my profs grad student wrote but the author apparently used gns3 which seems to be a cisco specific ns3 version. I used ns3 once and found it had a steep learning curve. The gini simulator has a more basic user interface but is probably less powerful.

[1] https://citelab.github.io/gini5/ [2] https://docs.gns3.com/docs/


When has BGP not been implicated in causing issues though?

The first widespread incident I found was from 1997 [1], but I didn't look too hard.

I don't think there's really a satisfying way to play with BGP as a small network. Traffic engineering is where I think the fun would be, but you've got to have lots of traffic and many connections for it to be worthwhile. Then you'd be trying to use your announcements to coax the rest of the internet to push packets through the connections you want. As well as perhaps adjusting your outgoing packets to travel the connections you prefer when possible. Sadly, nobody lets me play with their setup.

One of the ways to get a sense of emergent routing behavior is if you have hosting in many places, you'll likely see a lot of differences in routes when you start going to far off countries. If you run traceroutes (or mtr) from your home internet and your cell phone and various hosting, and if you can trace back... you'll likely see a diversity of routes. Sometimes you'll see things like going from west coast US to Brazil, where one ISP will send your packets to florida, then Brazil, and one ISP will send your packets to Spain, then Brazil, with a lot more latency.

[1] https://en.m.wikipedia.org/wiki/AS_7007_incident


I still remember when Pakistan accidentally shut down YouTube in the entire world for about 2 hours in 2008: https://www.cnet.com/culture/how-pakistan-knocked-youtube-of...

You can play with BGP by joining https://dn42.eu/ - a fake internet with a few thousand participants who are mostly as clueless as you, and none of whom will lose millions of dollars per hour if it breaks (which is not infrequently).

I think you need to manage a real (and large) network that's connected to global internet traffic in order to "play" with BGP. Well, you can tinker with it at home, but only by using a network simulator.

It doesn't have to be that large. Many people have "personal" ASNs.

Check out this blog (not me, I just remember it from years back): https://blog.thelifeofkenneth.com/2017/11/creating-autonomou...


I worked an internship where I spent the summer setting up new equipment for a large corp that was replacing everything that AT&T had installed and managed with their own stuff. Nearly every office had their own ASN, everyone else got regular broadband or just a box of aircards depending on the number of users. I knew nothing about networks other than setting up my own consumer router at home so it was a pretty fun learning experience. I always got a smile on my face when I finally got vRouter to peer with our dummy AS in the office then we'd pack it all up and bring it out for installation over the weekend. I got offered a job to come back after I graduated but turned it down for something that paid better and was a lot more interesting. Honestly, I probably would be making more money as a network engineer now if I stuck with it.

The world is kinda screaming for experienced networking engineers ye

The author spoke about this story on my favourite podcast, On The Metal (of Oxide Computer Company)

https://onthemetal.transistor.fm/episodes/kenneth-finnegan


Really interesting post! Thanks for sharing

You can set up local BGP routers and peer them and play with it.

Another fun thing is to log into publicly available looking glass servers. Most ISPs (including very, very, very large ones) operate routers that have their full view of the BGP routing tables. They either run web interfaces that let you query those tables (more common) or make public ssh or telnet credentials to log in with roles that have very limited access to the available commands, but have read rights to those tables.



I've used BGP internally at my company for a decade, using AS65xxx range. At home I use BGP between the house, garage and shed, I much prefer it to OSPF.

Same! At previous company I worked at we used BGP for all internal/external routing about 15 years ago despite all the poo-pooing by using BGP as an IGP. It was nice having no route redistribution and one command to monitor sessions.

BGP is chill and robust, OSPF is correct and fast. Both have their own place in a network.

Should we know what OSPF is too?

Depends how much you want to know about how networks work. Never ceases to amaze me how ignorant modern software developers are of the underlying technology, I guess that's because I'm from the pre-2010s when "Information Technology" was a general field.

> Never ceases to amaze me how ignorant modern software developers are of the underlying technology, I guess that's because I'm from the pre-2010s

Don't let my ignorance color your opinion of the youth of today.


I have trained people on network technologies, including the younger generation. It never ceases to amaze me how much they can get done without a clue about the underlying technologies. Sometimes it feels like they have some super power, because I can't operate without that knowledge.

I took some comp-sci and majored in "IT" in the 2000s. Lower level CS did not go over routing protocols, and the IT side never got into compilation, linking, state machines, or pointers.

In the 2000 my team had to deal with everything from compilation problems to hardware answering arp answers with fake mac addresses. The team consisted of a wide range of skills and abilities and information obviously leaks. While the DBA didn't need to know anything about OSPF, just by being in the same team as the network person they pick up how things work.

Now it seems that teams seem to be far more specialised and there's less cross-specialist learning.


Missed pointers!? Surprised me. (Am old)

Open Shortest Path First (OSPF) is an "internal" routing protocol. Basically, it is a protocol for routers to share routes when all routers are managed by the same organization.

Border Gateway Protocol (BGP) has the primary purpose of sharing routes between routers managed by different organizations. It can be used within an organization too. It has a lot more control over how and which routes it sends and receives.


Depends if you do any routing on multipath networks. Most people don't so there's that.

ECMP? Can do that with static routes. As long as you have more than 1 router you could set up a routing protocol.

Or did you mean multipoint?


CCNA had OSPF and that was part of my college curriculum in 2012.

It depends on what you study.

I did more of a sysadmin track, you (probably?) did pure comp sci/dev and would not encounter OSPF in a dev job (probably).


Unless you're heavily into networking and the ISP space, there's basically no need for you to know about routing protocols.

You don’t need a large network to participate to BGP. You just need a /24 (IPv4) or /48 (IPv6) allocation, AS number, and a business class Internet connection that can do BGP. Might be out of reach for most hobbyists but not impossible.

You don't even need a business class connection. You can do BGP over a tunnel to a VPS or colo.

On top of the already suggested local BGP routers you can also use https://dn42.us/ to test a bit more real-world like scenarios.

Not really, you can learn it just fine with simulators and a few routers. Designing and operating BGP in a large network is another thing though.

I learned (and later taught) BGP (and routing in general), albeit at a superficial level, in high school already. Then I actually got to work with it during labs in university.

So back when I did Wisp stuff I'd set up simulates networks between multiple machines with real and virtual networks. VyOS which was similar to the UBNT equipment we were using is light weight and supports multiple protocols.

In my opinion, containerlab is one of the easier tools to setup a lab environment for networking. You define a network with yaml which consists of nodes and links between them and it creates these using docker. They also have a BGP peering example lab: https://containerlab.dev/lab-examples/peering-lab/

Its hard to get real hands on BGP experience.

A lab wont ever reflect the complexity of a carrier environment.

That said, just bang a couple of mikrotiks together if you want to play with it.


True indeed, true indeed.

Cisco offers some simulator tooling. It basically virtualizes a lot of networking devices and allows you to play LEGO/SimCity with them: Cisco Packet Tracer

https://www.netacad.com/learning-collections/cisco-packet-tr...

Now, we built toy networks from scratch while I was working toward my certification. Surely larger-scale simulation files could be loaded into Packet Tracer. And perhaps, vendors have simulators on a larger scale than the free downloads?

https://developer.cisco.com/modeling-labs/

When I worked at a regional ISP, my supervisor was the BGP wizard. He referred to exterior routing as "a black art". Even more, the telcos were deploying their own technologies like Frame Relay and SMDS, which are Layer 1/Layer 2 protocols beyond the standard "point-to-point" leased lines.

We once experienced a fiber cut on our T-3 backbone (construction workers didn't dial 811). So my supervisor arranged the BGP routes to send everything over a 56k line, IIRC. He gloated about it. The packet loss rate was absurd, but our customers had connectivity!


>When I worked at a regional ISP, my supervisor was the BGP wizard. He referred to exterior routing as "a black art".

Yep this seems like a very common experience. I tend to find most environments have one guy making BGP changes outside of project work.

>We once experienced a fiber cut on our T-3 backbone (construction workers didn't dial 811). So my supervisor arranged the BGP routes to send everything over a 56k line, IIRC. He gloated about it. The packet loss rate was absurd, but our customers had connectivity!

The modern version of this: At a small national ISP, we had our intercarrier lines cut. Megaport has this billing model where you only pay for the capacity you use, so our backup intercapital was a 1MB megaport service. Intercapital goes down, everyone kicks over to the megaport and we just log on to the megaport portal and raise the bandwidth to a few gig temporarily. Cost almost nothing to keep it sitting there ready for use. And yeah the engineer responsible was extremely and deservedly smug.

>And perhaps, vendors have simulators on a larger scale than the free downloads?

My experience is that you need both the exact hardware/firmware AND the exact config to perfectly simulate some of the weird and wonderful stuff. Largely because so much of the protocols issues, like the OP suggests, is down to individual vendor implementations of the protocol.

For instance, I used to consult for a small ISP that had a very unreliable peer. That peer would send them routes for everything, but occasionally their PE's routing plane would collapse and stop forwarding traffic to/from their other peers.

We still received enough packets to not trip any failover, and routes were still being advertised. So until they realised and rebooted their hardware, we had to withdraw our routes.

This is the specific behaviour between (IIRC) Cisco IOS-XR on our end, their predominantly mikrotik environment, and their other peers who I believe were mostly juniper.

I cant imagine simulating that without the relevant hardware and configs.


Well, what did you study in the university? I did learn about BGP and routing in university since one of my subjects was information networks and protocols. But haven't really used it outside of some lab exercises since there's been no need at work nor at home.

CCNA has BGP now and that was part of my college curriculum.

It depends on what you study.

I did more of a sysadmin track, you (probably?) did comp sci/dev and would not encounter BPG in a dev job (probably).


It’s very much hidden, which I guess is a success of the design. You need not concern yourself with the web of ASNs when using IP.

BPG is like international shipping... it has to be there for the world to function, but most people don't need to interact with it.

One way to play with it is something like this: https://www.eve-ng.net/

The other is to make a couple of virtual machines with a couple of network interfaces, make some sort of network betweeen them and then use some bgp routing deamon, eg:

https://bird.network.cz/

https://www.nongnu.org/quagga/

etc.


if you're a linux person consider a routing on host setup with FRR with /32s. As every host is a /32 network you can focus more on the aspects of BGP rather than TCP/IP.

I remember Helsinki CS having quite a bit of BGP, TCP and both ipv4 and ipv6. No guarantees that every student aced those classes, but the teaching definitely was there

I mean the birthplace of Nokia would have it ofc ;)

/Live in Ericsson lands


TCP/IP affects every networked application and endpoint on the internet.

BGP runs the internet routing "in the background" and you only need to know it if you're an internet service provider or work in a large org managing the network. If you didn't learn network routing, you aren't going to learn BGP.

Put two or three VMs (OpenBSD has OpenBGPD daemon) onto a shared virtual switch and addresses in 172.31.255.0/24, connect the VMs. Also each of the VMs should have at least one other interface onto unique virtual switches with their own network (172.31.1.0/24, 172.31.2.0/24, etc).

Then set up BGP to redistribute connected routes.


Several vendors had this bug in the past https://www.kb.cert.org/vuls/id/347067

CVE-2023-4481 (Juniper) CVE-2023-38802 (FRR) CVE-2023-38283 (OpenBGPd) CVE-2023-40457 (EXOS)

Arista was not affected then.


Our IOS XR chassis' have gotten some of these packets. Corresponding with high bgp route advertisements. No idea what equipment upstream uses tbh.

Makes me wonder if the BGP protocol is properly fuzzed. Perhaps its one of those things that everyone is scared to try to knock over given it's so important.

I suppose it would be easy to write a fuzzer for bgp but very hard to diagnose crashes?


(author of the post)

Yes, this is exactly what I did in the post I linked to: https://blog.benjojo.co.uk/post/bgp-path-attributes-grave-er...


Bravo!

This is great research!


Has there ever been anything so byzantine in scale and accidental complexity as internet plumbing.

Given the impact of such bugs, I'm surprised there isn't a consortium with an interoperability test suite. Or maybe there is, and this specific issue isn't in the test suite. In which case, I'm surprised test cases aren't generated with a fuzzer and/or machine-generated full exploration of possible packet errors. I mean, it's fine if the suite takes hours or even days to run.

I guess the author of the article here has written a fuzzer with some coverage, and has come across similar issues before. Astonishing that the vendors don't pick up on this work hungrily.


> At 7AM (UTC) on Wednesday May 20th 2025

May 20th was a Tuesday, just sayin'


Great catch, there were like 3 mini incidents this week that I was keeping track of, so wires got crossed, will correct the post in a moment

Although at 7AM UTC in some parts of the world it was a Monday :D

Yes, but in those parts of the world it also wasn't May 20 at that time

Well surprise, people cheat because an academic degree is yet another checkbox to be ticked in the ever more grueling list of tasks needed to be done to acquire a job that has even a remote chance of paying a living wage.

Fix that, so that only those actually interested in academia per se and not just because they need a checkbox to tick remain, and the problem with cheating in academia will collapse.


Does multicast over the Internet even work?

I thought BGP was only for private networks.


BGP is the routing protocol of the Internet. There effectively is no other choice of routing protocol between autonomous systems. A reasonable synonym for "Internet" is "the global BGP routing table".

BGP also doesn't use multicast, you may be thinking of OSPF on multiaccess networks. BGP uses tcp/179 unicast to the IP addresses of its configured peers.

That said, multicast works just fine over the Internet. It's not commonly used, certainly not by home users and not very often by enterprise users, and was phased out on Internet2 by 2021 (I think?), but there's absolutely nothing in principle that would make it not work.


> That said, multicast works just fine over the Internet. It's not commonly used, certainly not by home users and not very often by enterprise users, and was phased out on Internet2 by 2021 (I think?), but there's absolutely nothing in principle that would make it not work.

In principle, no. In practice, I don't think many ISPs have equipment configured to forward multicast, except for those using multicast for TV and those probably don't interconnect with others.


You can definitely route multicast over the Internet via some kind of tunnel, be it stunnel, VPN, GRE. The ICE exchange uses stunnel for dev/certification multicast.

Many years ago UK ISPs particpated in an MBone with BBC, ITV etc providing live broadcast

https://www.bbc.co.uk/multicast/tv/channels.shtml

Brandon Butterworths note about "why"

https://support.bbc.co.uk/multicast/why.html

Shows the growth of the backbone and CDNs:

> The Olympic audience is expected to be around 50K streams, delivering 10Gbit+ is on the limit of sensible unicast delivery.

In 2020 the BBC's internal CDN was delivering 100 times that [0] for 250k users, and 5 years later I suspect it's another order of magnitude given that iplayer does 5 million concurrent live views quite frequently [1,2]

[0] https://medium.com/bbc-product-technology/bbc-online-2020-in...

[1] https://www.bbc.co.uk/mediacentre/2024/audiences-flock-to-bb...

[2] https://www.bbc.co.uk/mediacentre/2022/england-v-iran-bbc-li...

By 2035 and TV turnoff there's no reason to believe that the infrastructure won't have been able to scale another 100 fold and handle 500 million concurrent live streams. Makes no sense to multicast out 30 different formats, to people on phones and tablets and TVs hanging off wifi. It's a very different consumer experience than a PC wired into an ISP like it was in 2007.


FWIW, 100% that BGP itself doesn't *use* multicast, but it can *propagate* multicast routing information. It's certainly technically possible to support multicast on the Internet (..thus the invention of MBGP) but in practice has been a non-starter for a whole bunch of reasons.

In fact, IPv4 and IPv6 both have a reserved range of multicast addresses (formerly called "Class D"):

https://en.wikipedia.org/wiki/Multicast_address

There were more than a few people who spotted how disused this range had become after mbone experiments, and sometimes suggested reclaiming the range as IPv4 address space was being exhausted.

Interestingly, there are reserved multicast addresses (yes, addresses, not ports) not only for OSPF, but for many other interior routing protocols, as well as mDNS, LLMNR, and NTP. Conspicuously absent is any reservation for BGP.


Anycast is pretty useful on the Internet :)

Anycast is a very different beast, though. Anycast is just unicast but you announce the same IP space from multiple destinations, and the network figures out how to get a packet to the closest one. If one of those destinations fails, it just goes to the next closest one.

Unicasts, multicasts, and broadcasts all actually work differently underneath and require specific handling by network equipment. Anycast is just a special case of unicast and generally speaking network equipment is completely unaware of it.


It's sort of correct and sort of incorrect to say it's the only possible routing protocol between autonomous systems. Since many inter-AS connections are effectively made-to-order, they could use a different protocol, but in order to participate in the global BGP mesh, it would have to have semantics similar enough to BGP. Most notably, it would have to support the concept of AS-path. However I don't think there are any universal requirements beyond that. BGP isn't a single distributed algorithm like say OSPF - a completely separate instance of BGP runs between every pair of connected ASes and they share some data, indirectly forming a global system.

BGP is very much not only for private networks - it is what the internet is built on.

Maybe you are thinking of iBGP or something like OSPF?


All traffic over the Internet is routed by routes that have been propagated through BGP. It's how adjacent networks tell each other what IPs they originate, and what networks they are connected to.

No, BGP is not only for private networks. ASNs use it to exchange messages between each other.

Unsure about multicast...my gut would say no but I'd not trust that answer if I had to bet on it.

BGP is the only way I know of that autonomous Systems can talk to each other and negotiate.


The "Internet" is a bunch of separate networks operated by different entities. BGP is what allows them to connect to each other by exchanging routing formation.

There was something called the mbone back in the day. Nowadays you can't really send random multicast, but its very much in use by ISPs for IPTV.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: