Traceroute Isn't Real | Gekk - Alan's Bookmarks

5766 shaares
507 private links

5766 shaares · 507 private links

Filters

Links per page

20 50 100

Traceroute Isn't Real | Gekk

There is no such thing as traceroute.

I used to deliver network training at work. It was freeform, I was given wide latitude to design it as I saw fit, so I focused on things that I had seen people struggling with - clearly explaining VLANs in a less abstract manner than most literature, for instance, as well as actually explaining how QoS queuing works, which very few people understand properly.

One of the "chapters" in my presentation was about traceroute, and it more or less said "Don't use it, because you don't know how, and almost nobody you'll talk to does either, so try your best to ignore them." This is not just my opinion, it's backed up by people much more experienced than me. For a good summary I highly recommend this presentation.

But as good as that deck is, I always felt it left out a crucial piece of information: Traceroute, as far as the industry is concerned, does not exist.

Look it up. There is no RFC. There are no ports for traceroute, no rules in firewalls to accommodate it, no best practices for network operators. Why is that?

Traceroute has no history
First off: Yes, there is a traceroute RFC. It's RFC1393, it's 31 years old, and to my knowledge nothing supports it. The RFCs are jam-packed with brilliant ideas nobody implemented. This is one of them. The traceroute we have is completely unrelated to this.

Unsurprisingly however, it's a good description of how a traceroute protocol should work. //

As the linked presentation explains, traceroute simply no longer works in the modern world, at least not "as designed" - and it no longer can work that way, for several reasons not the least that networks have been abstracted in ways it did not anticipate.

There are now things like MPLS, which operate by encapsulating IP - in other words, putting a bag over a packets head, throwing it in the back of a van, driving it across town and letting it loose so it has no idea how far it's traveled. Without getting much further into how that works: It is completely impossible for it to satisfy the expectations of traceroute.

This "tool" works purely at layer 3, so it's impossible for it to adapt to the sort of "layer 12-dimensional-chess" type shenanigan that MPLS does - and there are other problems, but they're all getting ahead of reality, since traceroute never even worked correctly as intended, and there's no reason it would.

Traceroute, you see, is "clever," which is an engineering term that means "fragile." When programmers discover something "clever," any ability they may have had to assess its sustainability or purpose-fit often goes out the window, because it's far more important to embrace the "cleverness" than to solve a problem reliably. //

I can't count how many times this happened, but I do remember after about four years of doing this, I had come up with a method for getting more accurate latency stats: just ping -i .1. Absolutely hammer the thing with pings while you have the customer test their usual business processes, and it'll be easier to see latency spikes if something is eating up too much bandwidth.

What I discovered is that running two of these in parallel would produce exactly 50% packet loss, with total reliability. I then tested and found that if I just fired up three or four normal pings, at the default interval, it would do the same thing. 30% or 40% packet loss.

There is no telling how many issues we prolonged because everyone was running their own pings simultaneously and the kernel was getting overloaded and throwing some of them out. This is a snapshot of every network support center, everywhere. It is a bad scene.

internet · network

December 16, 2024 at 2:34:25 PM UTC * · permalink

https://gekk.info/articles/traceroute.htm#

Filters

Links per page

20 50 100