Chapter 11. Human Intervention in Chaos Experiment Automation

So far you’ve created automated chaos engineering experiments that run from beginning to end. You kick off the experiment with chaos run, and when it is completed, you look at the output to see if there has been any deviation from the steady-state hypothesis that might indicate evidence of a possible system weakness.

For many automated chaos experiments, this execution strategy is perfect. Especially when you consider running chaos experiments continuously as tests (see Chapter 12), the capability of running experiments with no human intervention is essential.

Sometimes, though, more operational control is needed over the execution of a chaos experiment. Perhaps you have a probe activity in the steady-state hypothesis that you’d like to occasionally skip, or maybe there’s an action activity in the experiment’s method that you’d like to get a choice between continuing and executing or skipping in some cases. You might also want to introduce the ability to hit a metaphorical “Big Red Stop Button” so that you can choose to abort the experiment entirely if things start to go awry.

Each of these cases requires human intervention in the execution of your chaos experiments. The experiments will still be automated, but you want to allow someone to make choices about how an experiment is executed as it is running. All of these scenarios can be accomplished using the Chaos Toolkit’s control feature.

Let’s see how to do this now by creating two new Chaos Toolkit controls from scratch that introduce support for the following types of human interaction in your automated chaos experiment execution:

  • A simple “Click to continue…"–style interaction

  • An “Abort now!” user interaction that asks, at every step, whether you want to abort the whole experiment—otherwise known as the “Big Red Stop Button”

Creating a New Chaos Toolkit Extension for Your Controls

Just like Chaos Toolkit drivers, Chaos Toolkit controls are written in Python and usually placed within a Chaos Toolkit extension module. You could simply add your new controls to an existing Chaos Toolkit extension project—perhaps the one you created earlier for your custom Chaos Toolkit driver (see “Creating Your Own Custom Chaos Driver in Python”)—but to keep things clean and simple, you’re going to create a new extension to contain your new controls.

As you did earlier, create a new Chaos Toolkit extension module using Cookiecutter:

(chaostk) $ cookiecutter \
            https://github.com/dastergon/cookiecutter-chaostoolkit.git
full_name [chaostoolkit Team]: your_name
email [contact@chaostoolkit.org]: your_email
project_name [chaostoolkit-boilerplate]: chaostoolkit-hci
project_slug [chaostoolkit_hci]: chaoshci
project_short_description [Chaos Toolkit Extension for X.]: Chaos Toolkit Extension \
that adds a collection of human interaction controls for automated \
chaos experiments.
version [0.1.0]: 0.1.0

Now you should have a new Chaos Toolkit extension within a chaostoolkit-hci directory:

(chaostk) $ tree chaostoolkit-hci
chaostoolkit-hci
├── CHANGELOG.md
├── LICENSE
├── README.md
├── chaoshci
│   └── __init__.py
├── ci.bash
├── pytest.ini
├── requirements-dev.txt
├── requirements.txt
├── setup.cfg
├── setup.py
└── tests
    └── __init__.py

Adding Your (Very) Simple Human Interaction Control

The first control you’re going to add to your chaostoolkit-hci extension project is a simple “Click to continue…"–style interaction. The control will need to listen to the execution of any activity within the running experiment, pause and prompt the user with “Click to continue…” before each activity, and then continue when any key is pressed. Simple enough! In the chaostoolkit-hci/chaoshci directory, create a new file called simplehci.py with the following contents:

# -*- coding: utf-8 -*-
from chaoslib.types import Activity 1
import click

__all__ = ["before_activity_control"]

def before_activity_control(context: Activity, **kwargs): 2
    """
    Prompt to press a key before an activity is executed.
    """
    click.pause() 3

Here’s what’s happening in this code:

1

The Chaos Toolkit provides a convenience type for any activity called Activity.

2

You declare the single control callback method that you need to listen and act whenever any activity is about to be executed.

3

Using the click Python library’s pause function, you wait with the appropriate prompt for the user to press any key.

Now let’s give this new control a spin. First you will need to install the click library, because it was not included by default when you generated the Chaos Toolkit extension project using Cookiecutter. To add the new dependency so that it is available at runtime, edit the chaostoolkit-hci/requirements.txt file so that it contains the following:

chaostoolkit-lib>=1.0.0
logzero
click

You can now install these dependencies by executing the following command in the chaostoolkit-hci directory:

(chaostk) $ pip install -r requirements.txt -r requirements-dev.txt

Before you can take your new control for a spin, you need to install the project as a Python module that can be reached by your Chaos Toolkit installation. You can use pip to do this by executing:

(chaostk) $ pip install -e .
...
Installing collected packages: chaostoolkit-hci
  Running setup.py develop for chaostoolkit-hci
Successfully installed chaostoolkit-hci

You can check that your new chaostoolkit-hci development code is ready and available by executing a pip freeze:

(chaostk) $ pip freeze
...
# Editable install with no version control (chaostoolkit-hci==0.1.0)
-e /Users/russellmiles/temp/chaostoolkit-hci

You’re all set to give this control its first run! To do this, you need to have an experiment handy, so create a new file called chaostoolkit-hci/samples/simple-interactive-experiment.json and add this code to the file:

{
    "version": "1.0.0",
    "title": "A simple sample experiment that can be executed to show controls",
    "description": "Does nothing at all other than be executable \
                    by the Chaos Toolkit",
    "tags": [],
    "steady-state-hypothesis": {
        "title": "Assess the Steady State ... (not really in this case)",
        "probes": [
            {
                "type": "probe",
                "name": "hypothesis-activity",
                "tolerance": 0,
                "provider": {
                    "type": "process",
                    "path": "echo",
                    "arguments": "'updated'"
                }
            }
        ]
    },
    "method": [
		{
            "type": "action",
            "name": "method-activity",
            "provider": {
                "type": "process",
                "path": "echo",
                "arguments": "'updated'"
            }
        }
    ],
    "rollbacks": [
        {
            "type": "action",
            "name": "rollback-activity",
            "provider": {
                "type": "process",
                "path": "echo",
                "arguments": "'updated'"
            }
        }
    ]
}

This experiment doesn’t actually do much other than contain activities in all of its major sections. The steady-state-hypothesis, method, and rollbacks sections each contain a single activity that simply echoes out some text. The main point here is to have an experiment in which you can showcase your new control.

Even though your new control is available, controls are disabled by default, so if you run this experiment now you won’t see any of the interaction that you want the control to add:

(chaostk) $ chaos run samples/simple-interactive-experiment.json
[2019-04-25 12:02:58 INFO] Validating the experiment's syntax
[2019-04-25 12:02:58 INFO] Experiment looks valid
[2019-04-25 12:02:58 INFO] Running experiment: A simple sample experiment \
                           that can be executed to show controls
[2019-04-25 12:02:58 INFO] Steady state hypothesis: Assess the \
                           Steady State ... (not really in this case)
[2019-04-25 12:02:58 INFO] Probe: hypothesis-activity
[2019-04-25 12:02:58 INFO] Steady state hypothesis is met!
[2019-04-25 12:02:58 INFO] Action: method-activity
[2019-04-25 12:02:58 INFO] Steady state hypothesis: Assess the \
                           Steady State ... (not really in this case)
[2019-04-25 12:02:58 INFO] Probe: hypothesis-activity
[2019-04-25 12:02:58 INFO] Steady state hypothesis is met!
[2019-04-25 12:02:58 INFO] Let's rollback...
[2019-04-25 12:02:58 INFO] Rollback: rollback-activity
[2019-04-25 12:02:58 INFO] Action: rollback-activity
[2019-04-25 12:02:58 INFO] Experiment ended with status: completed

The experiment is running fine, but now it’s time to enable your new human interaction control so that you can literally seize control of your experiment while it is running. You have a few different ways to enable a Chaos Toolkit control (see “Enabling Controls”), but first you need to add the control to each of the activities that you want to enable human interaction with. Edit your simple-interactive-experiment.json file so that it matches what’s shown here:

{
    "version": "1.0.0",
    "title": "A simple sample experiment that can be executed to show controls",
    "description": "Does nothing at all other than be executable \
                    by the Chaos Toolkit",
    "tags": [],
    "steady-state-hypothesis": {
        "title": "Assess the Steady State ... (not really in this case)",
        "probes": [
            {
                "type": "probe",
                "name": "hypothesis-activity",
                "tolerance": 0,
                "provider": {
                    "type": "process",
                    "path": "echo",
                    "arguments": "'updated'"
                },
                "controls": [
                    {
                        "name": "prompt",
                        "provider": {
                            "type": "python",
                            "module": "chaoshci.control"
                        }
                    }
                ]
            }
        ]
    },
    "method": [
		{
            "type": "action",
            "name": "method-activity",
            "provider": {
                "type": "process",
                "path": "echo",
                "arguments": "'updated'"
            },
            "controls": [
                {
                    "name": "prompt",
                    "provider": {
                        "type": "python",
                        "module": "chaoshci.control"
                    }
                }
            ]
        }
    ],
    "rollbacks": [
        {
            "type": "action",
            "name": "rollback-activity",
            "provider": {
                "type": "process",
                "path": "echo",
                "arguments": "'updated'"
            },
            "controls": [
                {
                    "name": "prompt",
                    "provider": {
                        "type": "python",
                        "module": "chaoshci.control"
                    }
                }
            ]
        }
    ]
}

You’ve added an explicit controls block to each activity in your experiment. This should mean that those activities will now prompt the user to press any key to continue before they are executed. Test this now by executing your experiment:

(chaostk) $ chaos run samples/simple-interactive-experiment.json
[2019-04-29 11:44:05 INFO] Validating the experiment's syntax
[2019-04-29 11:44:05 INFO] Experiment looks valid
[2019-04-29 11:44:05 INFO] Running experiment: A simple sample experiment \
                           that can be executed to show controls
[2019-04-29 11:44:05 INFO] Steady state hypothesis: Assess the \
                           Steady State ... (not really in this case)
Press any key to continue ...
[2019-04-29 11:44:08 INFO] Probe: hypothesis-activity
[2019-04-29 11:44:08 INFO] Steady state hypothesis is met!
Press any key to continue ...
[2019-04-29 11:44:09 INFO] Action: method-activity
[2019-04-29 11:44:09 INFO] Steady state hypothesis: Assess the \
                           Steady State ... (not really in this case)
Press any key to continue ...
[2019-04-29 11:44:10 INFO] Probe: hypothesis-activity
[2019-04-29 11:44:10 INFO] Steady state hypothesis is met!
[2019-04-29 11:44:10 INFO] Let's rollback...
[2019-04-29 11:44:10 INFO] Rollback: rollback-activity
Press any key to continue ...
[2019-04-29 11:44:11 INFO] Action: rollback-activity
[2019-04-29 11:44:11 INFO] Experiment ended with status: completed

For each activity, including the second execution of the steady-state hypothesis’s activity, you are prompted to continue. Success! However, in this simple case, it might be nicer not to have to specify the controls block for each activity. If every activity in an experiment will be subjected to a control, you can DRY things up a bit by moving the controls block to the top level in the experiment, as shown in the following code:

{
    "version": "1.0.0",
    "title": "A simple sample experiment that can be executed to show controls",
    "description": "Does nothing at all other than be executable \
                    by the Chaos Toolkit",
    "tags": [],
    "controls": [
        {
            "name": "prompt",
            "provider": {
                "type": "python",
                "module": "chaoshci.simplehci"
            }
        }
    ],
    "steady-state-hypothesis": {
        "title": "Assess the Steady State ... (not really in this case)",
        "probes": [
            {
                "type": "probe",
                "name": "hypothesis-activity",
                "tolerance": 0,
                "provider": {
                    "type": "process",
                    "path": "echo",
                    "arguments": "'updated'"
                }
            }
        ]
    },
    "method": [
		{
            "type": "action",
            "name": "method-activity",
            "provider": {
                "type": "process",
                "path": "echo",
                "arguments": "'updated'"
            }
        }
    ],
    "rollbacks": [
        {
            "type": "action",
            "name": "rollback-activity",
            "provider": {
                "type": "process",
                "path": "echo",
                "arguments": "'updated'"
            }
        }
    ]
}

Now when you run the experiment again, you’ll see the same result as before but with far less code repetition.

That’s your first control completed! Now let’s go a bit further with another control that allows you to skip or execute individual activities in an experiment.

Skipping or Executing an Experiment’s Activity

The next Chaos Toolkit control you’re going to implement will provide a “Yes/No"–style interaction to run or skip a specific activity. You’ve seen just how simple it is to create a control; this will be just a small refinement! Add a new file to the chaoshci directory called aborthci.py and add the following code to it:

# -*- coding: utf-8 -*-
from chaoslib.types import Activity
from chaoslib.exceptions import InterruptExecution
import click
from logzero import logger

__all__ = ["before_activity_control"]


def before_activity_control(context: Activity, **kwargs):
    """
    Prompt to continue or abort the current activity.
    """
    logger.info("About to execute activity: " + context.get("name")) 1
    if click.confirm('Do you want to continue the experiment?'): 2
        logger.info("Continuing: " + context.get("name")) 3
    else:
        raise InterruptExecution("Activity manually aborted!") 4
1

Here you add a logging message to indicate that an activity is about to be executed. This will be helpful to note if the activity is skipped.

2

You’re using the click library’s confirm function, which asks a question and then returns True to continue, or False if not.

3

If the user indicates that the activity should be executed, then that is logged. Note that you don’t have to do anything else, simply carry on as normal.

4

If the user indicates that they do not want to continue with the activity, but instead want to skip it, then you take advantage of the InterruptException from the Chaos Toolkit. This exception aborts the execution of the current activity but still allows the experiment to continue to the next activity.

Copy your previous experiment in samples/simple-interactive-experiment.json into a new experiment file called samples/abortable-interactive-experiment.json and enable your new aborthci control in place of the existing simplehci control in the top-level controls block:

{
    "version": "1.0.0",
    "title": "A simple sample experiment that can be executed to show controls",
    "description": "Does nothing at all other than be executable \
                    by the Chaos Toolkit",
    "tags": [],
    "controls": [
        {
            "name": "prompt",
            "provider": {
                "type": "python",
                // Here's where you switch to your new control
                "module": "chaoshci.aborthci"
            }
        }
    ],

    ... The remainder of the experiment stays the same.

Now execute your new experiment and control, electing to abort the experiment when asked the question for the method’s activity:

(chaostk) $ chaos run samples/abortable-interactive-experiment.json
[2019-04-25 13:06:01 INFO] Validating the experiment's syntax
[2019-04-25 13:06:01 INFO] Experiment looks valid
[2019-04-25 13:06:01 INFO] Running experiment: A simple sample \
                           experiment that can be executed to show controls
[2019-04-25 13:06:01 INFO] Steady state hypothesis: Assess the \
                           Steady State ... (not really in this case)
[2019-04-25 13:06:01 INFO] About to execute activity: hypothesis-activity
Do you want to continue the experiment? [y/N]: y
[2019-04-25 13:06:03 INFO] Continuing: hypothesis-activity
[2019-04-25 13:06:03 INFO] Probe: hypothesis-activity
[2019-04-25 13:06:03 INFO] Steady state hypothesis is met!
[2019-04-25 13:06:03 INFO] About to execute activity: method-activity
Do you want to continue the experiment? [y/N]: N
[2019-04-25 13:06:04 CRITICAL] Experiment ran into an unexpected fatal error, \
                               aborting now.
    Traceback (most recent call last):
      ...
    chaoslib.exceptions.InterruptExecution: Experiment manually aborted!
[2019-04-25 13:06:04 INFO] Let's rollback...
[2019-04-25 13:06:04 INFO] Rollback: rollback-activity
[2019-04-25 13:06:04 INFO] About to execute activity: rollback-activity
Do you want to continue the experiment? [y/N]: y
[2019-04-25 13:06:06 INFO] Continuing: rollback-activity
[2019-04-25 13:06:06 INFO] Action: rollback-activity
[2019-04-25 13:06:06 INFO] Experiment ended with status: aborted

The experiment is aborted; your new control works as expected!

Aborting Does Not Cancel Execution of Rollback Activities

Raising the InterruptException does the trick of aborting the experiment, but you might have noticed that the activities in the rollbacks section are still attempted. Activities in the rollbacks section will always be executed, even if an experiment is aborted.