Chapter 17. TROUBLESHOOTING

When your network is working properly, it's all but invisible; you can send and receive files through the network between any pair of computers or other connected devices. But when a connection fails, or one of your users can't find a network node, or any of a truly amazing number of other possible problems occurs, as the local network expert, it's your job is to fix it. Network problems always have a specific cause (or combination of causes), even if that cause is not obvious.

Too often, a network error message will say something like "ask your network manager for assistance." But when the network manager is you, that message doesn't tell you how to solve the problem. This chapter offers some tools and methods that will help you identify and solve most network problems.

The key to successful troubleshooting is to follow a logical problem-solving process, rather than simply trying things at random until you stumble upon the correct solution to your problem. Most people who spend a lot of their time fixing things use a system like this without a formal plan, but if you're new to repairing computers and networks, consider using the techniques in this chapter as a guide.

Many of these suggestions are common-sense answers, rather than complex technical procedures. Don't overlook them; otherwise you can spend hours tracing a circuit or trying to find a bad connection just because somebody has unplugged a cable.

Remember that a problem that appears in your network might really be located on one of the computers or other devices connected to the network. In many cases, you will want to look for problems in the Windows, Macintosh, or Linux/Unix operating system as well as on the network itself.

The first step in solving a problem should be to identify the symptoms. Remember that computers and networks don't break down completely at random. Every piece of information you can find about a problem can help you isolate and solve it. Is the problem a failure to connect to a particular computer through the network, or an error message, or a file transfer that takes longer than usual? Is it limited to a single computer, or does it appear all over the network? Have any of the lights on your network router, switch, or modem changed color or gone dark? Does the problem occur when you are using a particular program or only when a certain desk lamp (or vacuum cleaner or any other electrical device) is turned on? As you identify symptoms, make a list—either on paper or in your mind.

If you see an error message, copy the exact text onto a piece of paper. You might have to restart the computer or go to another computer to search for information, and you will need the specific wording of the message. Don't ignore the cryptic code numbers or other apparently unintelligible information. Even if the message means nothing to you, it could be the key to finding the help you need.

Sometimes you can identify a pattern in the symptoms. When more than one user reports the same problem, ask yourself what those users have in common: Are they all trying to use the printer or connect to the Internet at the same time? Are they connected to the network by Ethernet cables or Wi-Fi? Does the problem happen at the same time every day?

If you're lucky, defining the problem can tell you enough to fix it. For example, if the Power LED indicator light on your modem is off, that's a good indication that the power cable is unplugged, either at the wall outlet or on the modem itself. If everybody has trouble connecting to the Internet during a rainstorm, maybe water is leaking into the telephone cable that carries your Internet connection from the utility pole to your house (that happened to me—the repair guy told me that the cable had been there since about 1927).

More often, your list of symptoms will be a starting point that you can use to search for more information. As you analyze the problem, ask yourself these questions:

Look for easy solutions before you start to tear apart hardware or run complex software diagnostic routines. Nothing is more aggravating than spending several hours running detailed troubleshooting procedures, only to discover that restarting a computer or flipping a switch is all that was needed to fix the problem.

If a single computer can't connect to the network, confirm that the physical cables providing those connections are not unplugged. Be sure to check both ends of each cable. If the whole network can't find the Internet, check the cables connected to the modem. If possible, examine the cable itself to make sure it hasn't been cut someplace in the middle.

Almost all routers, switches, modems, and network adapters have LED indicators that light when they detect a live connection. If one or more of these LEDs has gone dark, check the connection.

Most data plugs and sockets maintain solid connections, but it's possible that a plug might have come loose without separating itself from the socket, or a wire inside the plug might have a bad contact. If you suspect a loose connection, try wiggling the cable while you watch the LED indicator that corresponds to that socket. If the LED lights and goes dark as you shake the cable, try a different cable.

If you can't connect through a newly installed wall outlet, make sure the wires inside the outlet are connected to the correct terminals at both ends of the cable inside the wall (at the outlet and at the data center).

To quickly confirm that data is passing through the network to and from each computer, use the tools supplied with the computer's operating system to display network activity. In Windows, use the Networking tab in the Task Manager; in Linux, use the ethtool command (ethtool interfacename | grep Link). If the computer reports that no link is available, a cable is disconnected or the network adapter or hub has a problem.

If your search for simple solutions to a network problem or failure doesn't produce an answer, the next step is to identify the physical location where the problem is occurring. Although it's easy (and often appropriate) to think about a network as an amorphous cloud that exists everywhere at the same time, when you're looking for a specific point of failure, you must replace that cloud with a detailed map that shows every component and connection. If you don't already have a network diagram in your files, consider drawing one now.

Most problems offer some kind of hint about their location: If just one computer's connection to the network has failed, but all the others work properly, the problem is probably in that computer or its network link. But if nobody on the network can connect to any other computer or to the Internet, the problem is probably in a server, router, or other central device. Start searching for the source of a problem in the most logical device.

If you have a hardware problem, it's often effective to isolate the problem by replacing individual components and cables one at a time until the problem goes away. If the problem disappears when you install a replacement, that's a good indication that the original part was the source of the problem. If the replacement is a relatively expensive item like a router or a printer, you might want to send it back to the manufacturer for replacement or repair, especially if it's under warranty. But if you replace a cheap part like a cable or a network interface card, it's often easier to just throw it away and buy a new one.

Similar techniques can work with software. If a computer connection fails, try shutting down each program running on that computer, one at a time, and then try to reestablish the connection. If you recently installed a new program, driver, or update, try uninstalling the new software and test the connection again. If the connection works, the conflict is between the new software and your network connection or device driver. In Windows, try restarting the computer in Safe Mode and re-establishing the connection; if it works in Safe Mode, you know that the Windows operating system is not the source of the problem.

As you try to identify and solve a problem, keep a record of what you have done. Describe each problem you encounter and what you did to fix it in a simple log or notebook. Note configuration settings, websites that provide useful information, and the exact location of any options or control programs that caused the problem or helped solve it. Keep this on paper, rather than in a text file stored on the computer, so you will be able to access it if the computer breaks down again.

If the same problem appears again, your log will tell you exactly what you did to fix it the first time; rather than stepping through all the same unproductive troubleshooting techniques again, you can go directly to the correct solution.

One excellent approach is to keep a network notebook in a loose-leaf binder. Among other things, your notebook should include the following: