Remote commands

The script media type is quite powerful, and it could even be used to execute a command in response to an event. For the command to be executed on the monitored host, though, it would require some mechanism to connect, authorize, and so on, which might be somewhat too complicated. Zabbix provides another mechanism to respond to events—remote commands. Remote commands can be used in a variety of cases, some of which might be initiating a configuration backup when a configuration change is detected, or starting a service that has died. We will set up the latter scenario:

Navigate to Configuration | Actions and click on Create action. In the Name field, enter Restart Apache.
Go to Conditions and, in the New condition block, choose Host in the first drop-down. Then, select equals and, in the selection box, start typing another.
In the drop-down that appears, click on Another host.
Click on Add to add the condition (but do not click on the global Add button yet).

Let's create another condition. In the New condition block, in the first drop-down, choose Trigger name. Leave the second drop-down at the default value. In the input field next to this, enter Web service is down, and then click on Add control. The end result should look as follows:

Now, switch to the Operations tab. In the operations block, click on New. In the Operation details block that just appeared, choose Remote command in the Operation type field. Zabbix offers five different types of remote command:

Custom script
IPMI
SSH
Telnet
Global script

We will discuss SSH and telnet items in Chapter 10, Advanced Item Monitoring. We will discuss IPMI functionality in Chapter 14, Monitoring IPMI Devices. Global scripts will be covered later in this chapter but for now, let's look at the custom script functionality.

For custom scripts, you may choose to run them either on the Zabbix agent, server, or the Zabbix proxy. Running on the agent will allow us to gather information, control services, and do other tasks on the system where problem conditions were encountered. Running on the server will allow us to probe the system from the Zabbix server's perspective, or maybe access the Zabbix API and take further decisions based on that information. Running on a proxy, the script will be executed by the Zabbix server or proxy, depending on whether the host is monitored by the Zabbix server or the Zabbix proxy.

If you like to run remote commands, then don't forget to configure the EnableRemoteCommands option on your agents in the Zabbix config file.

For now, we will create an action that will try to restart the Apache web server if it is down. Normally, that has to be done on the host that had the problem. In the Target list section, click on the New link. The drop-down there will have Current host selected, which is exactly what we wanted, so click on the Add control just below it.

In the Commands textbox, enter the following:

sudo /usr/bin/systemctl restart httpd (or apache2)

This will be distribution-specific, but most Linux systems today use systemd, and so does Ubuntu and CentOS. In other cases, it may be the case that you have to use init.

We are restarting Apache just in case it has stopped responding, instead of simply dying. You can also enter many remote actions to be performed, but we won't do that now, so just click on the Add control at the bottom of the Operation details block. To save our new action, click on the Add button at the bottom.

When running remote commands, the Zabbix agent accepts the command and immediately returns 1—there is no way for the server to know how long the command took, or even whether it was run at all. Note that the remote commands on the agent are run without a timeout.

Our remote command is almost ready to run, except, on the agent side, there's still some work to be done, so open zabbix_agentd.conf as root and look for the EnableRemoteCommands parameter. Set it to 1 and uncomment it, save the config file, and then restart zabbix_agentd.

That's still not all. As remote commands are passed to the Zabbix agent daemon, which is running as a zabbix user, we also have to allow this user to actually restart Apache. As evidenced by the remote command, we will use sudo for this, so edit /etc/sudoers.d/zabbix on Another host as root and add the following line:

zabbix  ALL=NOPASSWD: /usr/bin/systemctl

For additional safety measures, use the visudo command. It should also check your changes for syntax validity. On some systems, sudo is only configured to be used interactively. You might have to comment the requiretty option in /etc/sudoers.

Again, change the script name if you need a different one. This allows the zabbix user to use sudo and restart the Apache web server. Just restart it; don't stop or do any other operations.

Make sure that the SMTP server is running on Another host, otherwise the web service trigger will not be triggered as we had a dependency on the SMTP trigger. Alternatively, remove that dependency.

Now, we are ready for the show. Stop the web server on Another host. Wait for the trigger to update its state and check the web server's status. It should start again automatically.

By default, all actions get two conditions. One of them limits the action to fire only when the trigger goes into the PROBLEM state, but not when it comes back to the OK state. For this action, it is a very helpful setting; otherwise, the web server would be restarted once when it was found to be down, and then restarted again when it was found to be up. Such a configuration mistake would not be obvious, so it might stay undetected for a while. You should also avoid enabling recovery messages for an action that restarts a service.

Note that remote commands on agents only work with passive agents; they will not work in active mode. This does not mean that you cannot use active items on such a host. You may do this, but remote commands will always be attempted in passive mode by the server connected directly to that agent. There might be a situation where all items are active and, thus, a change in configuration that prevents server-to-agent connection from working is not noticed, and then the remote command fails to work. If you have all items active and want to use remote commands, it might be worth having a single passive item to check whether that type of item still works.

While the need to restart services like this indicates a problem that would be best fixed for the service itself, sometimes it can work as an emergency solution, or in the case of an unresponsive proprietary software vendor.