You're having problems reaching a particular host or network, and ping confirms there is a problem, but there are several routers between you and the problem, so you need to narrow it down further. How do you do this?
Use traceroute, tcptraceroute, or mtr.
traceroute is an old standby that works well on your local network. Here is a two-hop traceroute on a small LAN with at least two subnets:
$ traceroute mailserver1
traceroute to mailserver1.alrac.net (192.168.2.76), 30 hops max, 40 byte packets
1 pyramid.alrac.net (192.168.1.45) 3.605 ms 6.902 ms 9.165 ms
2 mailserver1.alrac.net (192.168.2.76) 3.010 ms 0.070 ms 0.068 ms
This shows you that it passes through a single router, pyramid. If you run traceroute on a single subnet, it should show only one hop, as no routing is involved:
$ traceroute uberpc
traceroute to uberpc.alrac.net (192.168.1.77), 30 hops max, 40 byte packets
1 uberpc (192.168.1.77) 5.722 ms 0.075 ms 0.068 ms
traceroute may not work over the Internet
because a lot of routers are programmed to ignore its UDP datagrams.
If you see a lot of timeouts, try the -I
option, which sends ICMP ECHO requests
instead.
You could also try tcptraceroute, which sends TCP packets and is therefore nearly nonignorable:
$ tcptraceroute bratgrrl.com
Selected device eth0, address 192.168.1.10, port 49422 for outgoing packets
Tracing the path to bratgrrl.com (67.43.0.135) on TCP port 80 (www), 30 hops max
1 192.168.1.50 6.498 ms 0.345 ms 0.334 ms
2 gateway.foo.net (12.169.163.1) 23.381 ms 22.002 ms 23.047 ms
3 router.foo.net (12.169.174.1) 23.285 ms 23.434 ms 22.804 ms
4 12.100.100.201 54.091 ms 48.301 ms *
5 12.101.6.101 101.154 ms 100.027 ms 110.753 ms
6 tbr2.cgcil.ip.att.net (12.122.10.61) 104.155 ms 101.934 ms 101.387 ms
7 tbr2.dtrmi.ip.att.net (12.122.10.133) 108.611 ms 105.148 ms 108.538 ms
8 gar3.dtrmi.ip.att.net (12.123.139.141) 108.815 ms 116.832 ms 97.934 ms
9 * * *
10 lw-core1-ge2.rtr.liquidweb.com (209.59.157.30) 116.363 ms 115.567 ms 149.428
ms
11 lw-dc1-dist1-ge1.rtr.liquidweb.com (209.59.157.2) 129.055 ms 137.067 ms *
12 host6.miwebdns6.com (67.43.0.135) [open] 130.926 ms 122.942 ms 125.739 ms
An excellent utility that combines ping and traceroute is mtr (My Traceroute). Use this to capture combined latency, packet loss, and problem router statistics. Here is an example that runs mtr 100 times, organizes the data in a report format, and stores it in a text file:
$ mtr -r -c100 oreilly.com >> mtr.txt
The file looks like this:
HOST: xena Loss% Snt Last Avg Best Wrst StDev 1. pyramid.alrac.net 0.0% 100 0.4 0.5 0.3 6.8 0.7 2. gateway.foo.net 0.0% 100 23.5 23.1 21.6 29.8 1.0 3. router.foo.net 0.0% 100 23.4 24.4 21.9 78.9 5.9 4. 12.222.222.201 1.0% 100 52.8 57.9 44.5 127.3 10.3 5. 12.222.222.50 4.0% 100 61.9 62.4 50.1 102.9 9.8 6. gbr1.st6wa.ip.att.net 1.0% 100 61.4 76.2 46.2 307.8 48.8 7. br1-a350s5.attga.ip.att.net 3.0% 100 57.2 60.0 44.4 107.1 11.6 8. so0-3-0-2488M.scr1.SFO1.gblx 1.0% 100 73.9 83.4 64.0 265.9 27.6 9. sonic-gw.customer.gblx.net 2.0% 100 72.6 79.9 69.3 119.5 7.5 10. 0.ge-0-1-0.gw.sr.sonic.net 2.0% 100 71.5 78.2 67.6 142.2 9.3 11. gig50.dist1-1.sr.sonic.net 0.0% 100 81.1 84.3 73.1 169.1 12.1 12. ora-demarc.customer.sonic.ne 5.0% 100 69.1 82.9 69.1 144.6 10.2 13. www.oreillynet.com 4.0% 100 75.4 81.0 69.8 119.1 7.0
This shows a reasonably clean run with low packet loss and low latency. When you're having problems, create a cron job to run mtr at regular intervals by using a command like this (using your own domain and filenames, of course):
$ mtr -r -c100 oreillynet.com >> mtr.txt && date >> mtr.txt
This stores the results of every mtr run in a single file, with the date and time at the end of each entry.
You can watch mtr in real time like this:
$ mtr -c100 oreillynet.com
You can skip DNS lookups with the -n
switch.
If any of these consistently get hung up at the same router, or if mtr consistently shows greater than 5 percent packet losses and long transit times on the same router, then it's safe to say that particular router has a problem. If it's a router that you control, then for gosh sakes fix it. If it isn't, use dig or who is to find out who it belongs to, and nicely report the trouble to them.
Save your records so they can see the numbers with their own eyes.
There are a lot of web sites that let you run various network tools, such as ping and traceroute, from their sites. This is a good way to get some additional information for comparison.
mtr can generate a lot of network traffic, so don't run it all the time.
tcptraceroute sends TCP SYN packets instead
of UDP or ICMP ECHO packets. These are more likely to get through
firewalls, and are not going to be ignored by routers. When the host
responds, tcptraceroute sends TCP RST to close
the connection, so the TCP three-way handshake is never completed.
This is the same as the half-open (-sS
) scan used by Nmap.