A seven-step approach to troubleshooting

The following seven-step approach should help you zero in on networking problems:

Identifying the problem: This may seem obvious at first, but in reality, many times we don’t know the extent of a problem. In such cases, we would be better off gathering information, identifying symptoms, and, in some cases, questioning users. If there is more than one problem, recognize it as such so you can approach each problem individually. When questioning users, try to elicit information without making them feel defensive (for example, What did you do to break the network? is generally not considered a useful line of questioning). You can ask the user how the network functions normally and compare it to how it functions now. Recreate the problem if needed and try to isolate the problem to the best of your ability.

Formulating a theory of probable cause: A single problem can have several possible causes, and it may not be easy to narrow it down to one cause. However, if you exercised due diligence in gathering information in the previous step, you can often eliminate at least some of the causes you enumerated at first. In many cases, the most obvious cause is the correct one, and starting with the most obvious possible cause is a reasonable approach. At the same time, your initial theory may turn out to be incorrect, and you may have to consider other theories as well. You may even have to revisit step 1 and resume gathering information.
Testing the theory: Once you have established a theory of probable cause, the next logical step is to attempt to confirm the theory. If you are able to confirm your theory, you can move on to the next step; otherwise, you will have to formulate an alternate theory.
Establishing a plan: We have formulated a theory of probable cause, and the theory seems to be correct. However, we still need to establish a plan of action to restore the network to full functionality. Establishing a plan is more important in enterprise-level environments, where formal procedures are in place for IT maintenance. The implementation of your solution may involve taking systems offline. If so, you have to determine when they will be taken offline, and for how long, and follow whatever formal or informal procedures the organization has for taking systems offline. At a minimum, it usually involves scheduling a time (often during nonworking hours) when the work will be done.
Implementing the solution: Now that you have gotten this far, you can make the necessary corrective change to the network, but your work is not done. You cannot assume that the solution will work without testing it, so you need to do that, and you also should take into account that early results might be deceptive. You may need to test your network several times to be sure that the solution you implemented works.
Verifying system functionality: This is somewhat of a corollary to step 5, but you need to consider that sometimes solutions that fix one problem create another problem. As a result, you need to verify full system functionality. Only when you have done this will you know that the solution was truly a success.
Documenting the problem and its solution: Now that you have successfully solved the problem, you need to document the problem/solution. This involves keeping a record of all steps taken when solving the problem. Documenting not only your successes but also your failures can save you (and others) time later on. In large organizations, keeping a record of the person who implemented the solution can help; this way, if someone in the organization has a question (and the person in question is still working for the company), it will be easy to find them.

The problem could have been reported by a member of the IT staff, but it may also have been reported by a user. If it was reported by a user, consider providing feedback to this user. If you do, it could encourage them to report problems in the future. It might also provide them with some insight into how to avoid the problem.

We have already introduced the seven-layer OSI model, not to mention the four-layer TCP/IP model for networks, and I see no reason to rehash them in great detail here. Nonetheless, they do provide an effective way of conceptualizing problems. Determining the layer on which the problem resides can help us diagnose and solve the problem:

Physical layer/Data link layer (OSI) or Link layer (TCP/IP): The physical layer of the OSI model covers problems such as damaged or dirty cabling or terminations, as well as high levels of signal attenuation or poor signal bandwidth. It can also cover problems dealing with wireless networks such as interference or malfunctioning access points. The data link layer of the OSI model covers problems such as MAC address and VLAN misconfigurations, suboptimal VLAN performance, and improper L2TP and/or OSPF configuration. The physical and data link layers of the OSI model are merged into the Link layer of the TCP/IP model.
Network Layer: This covers problems such as damaged or defective networking devices, and also includes misconfigured devices, lack of adequate network bandwidth, and so on.
Transport Layer: This covers problems with the TCP and UDP network protocols.
Session/Presentation/Application layers or Application layer (TCP): This can cover problems related to applications and their protocols.