Handling unreliable environments

So far in this chapter, we have focused on gracefully handling errors, and changing the default behavior of Ansible with respect to changes and failures. This is all well and good for tasks, but what about if you are running Ansible in an unreliable environment? For example, poor or transient connectivity might used to reach the managed hosts, or hosts might be down on a regular basis for some reason. The latter example might be a dynamically scaled environment that could be scaled up in times of high load and scaled back when demand is low to save on resources.

Luckily, a new playbook keyword, ignore_unreachable, was introduced that handles exactly these cases, and ensures that all tasks are attempted on our inventory even for hosts that get marked as unreachable during the execution of a task. This is best explained by means of an example, so let's reuse the error.yaml playbook to create such a case:

---
- name: error handling
  hosts: all
  gather_facts: false

  tasks:
  - name: delete branch bad
    command: git branch -D badfeature
    args:
      chdir: /srv/app
  - name: important task
    debug:
      msg: It is important we attempt this task!

We are going to try to delete the badfeature branch from a Git repository on two remote hosts as defined in our inventory. These hosts do not exist, of course; they are fictitious, and so we know they will get marked as unreachable as soon as the first task is attempted. In spite of this, there is a second task that absolutely must be attempted if at all possible. Let's run the playbook as it is and see what happens:

Note that important task was never attempted—the play was aborted after the first task since the hosts were unreachable. However, let's use our newly discovered flag to change this behavior. Change the code so that it looks like the code here:

---
- name: error handling
  hosts: all
  gather_facts: false

  tasks:
  - name: delete branch bad
    command: git branch -D badfeature
    args:
      chdir: /srv/app
    ignore_unreachable: true
  - name: important task
    debug:
      msg: It is important we attempt this task!

This time, note that even though the hosts were unreachable on the first attempt, our second task is still executed:

This is useful if, like the debug command, it might run locally, or perhaps it is vital and should be attempted even if connectivity was down on the first attempt. So far in this chapter, you have learned about the tools Ansible provides to handle a variety of error conditions with grace. Next, we will proceed to look at controlling the flow of tasks using loops—an especially important tool for making code concise and preventing repetition.