Chapter 8. Creating Custom Chaos Drivers

No two systems are created the same, and there are as many failure conditions out there as there are system implementation choices. For example, you might choose any of the following options:

Run on a virtual machine
Run on dedicated hardware in your own data center
Use a virtual network
Use a hardwired network
Use AWS, Azure, Google Cloud…(pick your own cloud infrastructure as a service provider!)

The list is almost endless, and that’s just at the infrastructure level! When you consider the many options available at other levels, such as the platform and application levels, you face a combinatorial explosion of options that would likely stymie any thoughts of a common tool for automated chaos experiments.

The key here is that none of these choices are wrong; they are just different, and your unique context will be different, too. Not everyone is running Docker containers in Kubernetes, or serverless event-driven functions—and even if they were, there’s still sufficient variation among the leading cloud providers to make adapting to each environment a challenge.

Then there are the choices that are special to your context. Even though many infrastructure, platform, and even application implementation decisions are being standardized and commoditized, it’s still likely you have something different from others in the industry. Maybe you have a legacy COBOL application, or perhaps you’ve forked and amended an open source framework just enough for it to be quite different from anything else out there. Whatever your situation, you just might have something special that also needs to be interacted with when you run your chaos experiments.

Fortunately, the Chaos Toolkit has the capability for customization built into it.

In Chapter 7 you learned that there are two main ways to extend the Chaos Toolkit to meet your needs:¹

Custom drivers that include your own implementations of probes and actions to support your experiments
Custom controls that provide ways to integrate your toolkit with operational concerns such as observability,2 or even human intervention³

A large and still-growing collection of open source extensions available in the Chaos Toolkit and the Chaos Toolkit Incubator implement some or all of these different extension points. You should always check those resources before creating your own custom extension, but it is quite common to extend your toolkit for your own context’s needs.

Let’s look at how you can create a custom driver to create custom actions and probes that can then be called from the steady-state-hypothesis, method, or rollbacks section of your experiment.

Creating Your Own Custom Driver with No Custom Code

You can integrate your experiments with your own systems without any additional code by using either of the following options from your experiment’s probes and actions:

Making HTTP calls: You can call an HTTP endpoint from your experiment’s probes and actions.
Calling a local process: You can call a local process from your experiment’s probes and actions.

Implementing Probes and Actions with HTTP Calls

You’ve seen how to call an HTTP endpoint from an experiment’s probe before (see “Exploring and Discovering Evidence of Weaknesses”). In a basic case in which you simply want to invoke an HTTP endpoint using a GET request, you first specify that the type of your probe’s provider is http:

{
    "type": "probe",
    "name": "application-must-respond-normally",
    "tolerance": 200,
    "provider": {
        "type": "http",

    }
}

Then you provide the URL that is to be invoked:

{
    "type": "probe",
    "name": "simple-http-call",
    "provider": {
        "type": "http",
        "url": "http://somehost"
    }
}

But what if you want to do more than a simple GET HTTP request? Not a problem—the http provider allows you to specify the following:

method: The HTTP method to use; i.e., GET (the default), POST, DELETE, etc.
headers: A list of keys and values to be passed as headers in the HTTP request
arguments: A list of keys and values to be passed as the request’s payload

There’s a problem here, though. By default, an HTTP probe like the one you’ve specified here will hang indefinitely if there is a problem reaching or serving from the specified URL endpoint. To take control over this, you can specify a timeout, in seconds, on your http provider:

{
    "type": "probe",
    "name": "simple-http-call",
    "provider": {
        "type": "http",
        "url": "http://somehost",
        "timeout": 3
    }
}

The return value from an HTTP probe is a collection of the HTTP status, the response headers, and the contents of the HTTP response. If your probe was declared inside of your experiment’s steady-state hypothesis block, then you can examine the status code to see if it is one of a collection as the probe’s tolerance:

"steady-state-hypothesis": {
    "title": "Services are all available and healthy",
    "probes": [
        {
            "type": "probe",
            "name": "application-must-respond-normally",
            "tolerance": 200,
            "provider": {
                "type": "http",
                "url": "http://192.168.99.100:32638/invokeConsumedService",
                "timeout": 3
            }
        }

It is possible that you might want to examine a lot more than just the HTTP response status when using an http provider–implemented probe in your steady-state hypothesis. Maybe you’d like to examine the contents of the response’s body for specific information, or even examine the returning headers for something important to your judgment that the system is in a recognizably normal state. Unfortunately, the http provider doesn’t allow such advanced introspection; if you need such power, it’s worth switching to a process provider such as a bash script,4 as described next. Alternatively, if much more programmatic control is needed, then you might consider implementing your own custom driver in Python (see “Creating Your Own Custom Chaos Driver in Python”).

Implementing Probes and Actions Through Process Calls

Similar to the http provider, there is a process provider in the Chaos Toolkit that allows your experiment to call any arbitrary local process as part of the experiment’s execution. This time you’ll create an action activity that uses the process provider:

{
    "type": "action",
    "name": "rollout-application-update",
    "provider": {
        "type": "process"

}

You specify the local process you want to invoke using path, and you can specify any arguments to be passed to the process by populating arguments:

{
    "type": "action",
    "name": "rollout-application-update",
    "provider": {
        "type": "process"
        "path": "echo",
        "arguments": "'updated'"
}

The process provider is a powerful way of integrating your experiments with anything that can be called locally. It is especially useful when you already have an existing set of tools that probe and act on your system, and you simply want to add the higher-level abstraction of the experiment itself on top of their functionality.

Creating Your Own Custom Chaos Driver in Python

Sometimes you just need the power of a full programming language and supporting ecosystem of libraries when creating your own custom probes and actions. This is when it’s time to consider implementing your own Python extension to the Chaos Toolkit.

The first step is to make sure you are in the Python virtual environment (or global environment, if that’s your preference) where your instance of the Chaos Toolkit is installed:

(chaostk) $ chaos --help
...

Here, you’re using the chaos command provided by the chaostk virtual environment you created earlier (see Chapter 4).

For our current purposes, you’re going to create a custom Python driver that begins to integrate with the Chaos Monkey for Spring Boot, another chaos engineering tool that allows you to inject all sorts of turbulent conditions into running Spring Boot applications.⁵

Drivers Are Perfect for Integrating with Existing Chaos Tools

Chaos drivers in the Chaos Toolkit were designed for the specific purpose of being able to integrate not only with target systems, but also with third-party chaos engineering tools. This is why there are now drivers for systems such as Toxiproxy and Gremlin. Both are very capable chaos-inducing systems in their own right, and the Chaos Toolkit makes it possible for you to add the experiment definitions to each tool, as well as to then pick the best of each for your own chaos engineering needs.

The requirement for your new Chaos Monkey for Spring Boot driver is to be able to provide a probe that can respond regardless of whether the Chaos Monkey for Spring Boot is enabled on a particular Spring Boot application instance.6

Creating a New Python Module for Your Chaos Toolkit Extension Project

The first step is to create a new Python module project to house your new extension code. You could do this manually, but there’s also a Python Cookiecutter template available that can get you set up with the necessary boilerplate code for your new module.

To install the Cookiecutter tool, enter the following, taking care to confirm that you have activated your Python virtual environment for the Chaos Toolkit:

(chaostk) $ pip install cookiecutter

Now you can create a new module called chaosmonkeylite with the following command, filling in the information for the various prompts as you go:

(chaostk) $ cookiecutter https://github.com/dastergon/cookiecutter-chaostoolkit.
git
full_name [chaostoolkit Team]: your_name
email [contact@chaostoolkit.org]: your_email
project_name [chaostoolkit-boilerplate]: chaostoolkit-chaosmonkeylite
project_slug [chaostoolkit_chaosmonkeylite]: chaosmlite
project_short_description [Chaos Toolkit Extension for X.]: Chaos Toolkit
Extension for the Chaos Monkey for Spring Boot
version [0.1.0]: 0.1.0

If everything has gone well you can now list the contents of your current directory and see the contents of your new chaostoolkit-chaosmonkeylite project:

(chaostk) $ tree chaostoolkit-chaosmonkeylite
chaostoolkit-chaosmonkeylite
├── CHANGELOG.md
├── LICENSE
├── README.md
├── chaosmlite
│   └── __init__.py
├── ci.bash
├── pytest.ini
├── requirements-dev.txt
├── requirements.txt
├── setup.cfg
├── setup.py
└── tests
    └── __init__.py

All good so far! Now you should be able to change into the chaostoolkit-chaosmonkeylite directory and install your new, empty extension module, which will be ready for development and testing:

(chaostk) $ cd chaostoolkit-chaosmonkey
(chaostk) $ pip install -r requirements-dev.txt -r requirements.txt
...

(chaostk) $ pip install -e .
...

(chaostk) $ pytest
Test session starts (platform: darwin, Python 3.6.4, pytest 3.3.0,
pytest-sugar 0.9.0)
cachedir: .cache
rootdir: /Users/russellmiles/chaostoolkit-chaosmonkeylite, inifile: pytest.ini
plugins: sugar-0.9.0, cov-2.5.1
Coverage.py warning: No data was collected. (no-data-collected)

 generated xml file: /Users/russellmiles/chaostoolkit-chaosmonkeylite/junit-
 test-results.xml

---------- coverage: platform darwin, python 3.6.4-final-0 -----------
Name                     Stmts   Miss  Cover   Missing
------------------------------------------------------
chaosmlite/__init__.py       2      2     0%   3-5
Coverage XML written to file coverage.xml


Results (0.02s):

Watch Out for Naming Conflicts

If there is already a directory called chaosmonkeylite wherever you ran the cookiecutter command, then you will get a conflict, as the Cookiecutter tool expects to create a new directory and populate it from empty.

Adding the Probe

Now that you have a Python module project all set up and reachable from your installation of the Chaos Toolkit, it’s time to add some features. The first feature required is to provide a probe that can respond regardless of whether the Chaos Monkey for Spring Boot is enabled on a particular Spring Boot application instance.

Practicing test-driven development, you can create the following test for this new probe in your tests module, in a file named test_probes.py:

# -*- coding: utf-8 -*-
from unittest import mock
from unittest.mock import MagicMock

import pytest
from requests import Response, codes

from chaosmlite.probes import chaosmonkey_enabled 


def test_chaosmonkey_is_enabled():
    mock_response = MagicMock(Response, status_code=codes.ok) 
    actuator_endpoint = "http://localhost:8080/actuator"

    with mock.patch('chaosmlite.api.call_api', return_value=mock_response) as
    mock_call_api:
        enabled = chaosmonkey_enabled(base_url=actuator_endpoint) 

    assert enabled 
    mock_call_api.assert_called_once_with(base_url=actuator_endpoint,
                                          api_endpoint="chaosmonkey/status",
                                          headers=None,
                                          timeout=None,
                                          configuration=None,
                                          secrets=None)

: Imports the chaosmonkey_enabled function from the chaosmlite module.
: Mocks out the expected response from the Chaos Monkey for Spring Boot API.
: While mocking out the call to the chaosmlite.api.call_api, returning an expected response for the Chaos Monkey for Spring Boot being enabled, calls the chaosmonkey_enabled probe function.
: Asserts that the response correctly identifies that the Chaos Monkey for Spring Boot is enabled.

Now if you run the pytest command, you should see the following failure:

(chaostk) $ pytest
...
E   ModuleNotFoundError: No module named 'chaosmlite.probes'
...

That seems fair—the test is trying to use probes that you haven’t even written yet! To do that now, add the following code into a probes.py file in the chaosmlite module directory:

# -*- coding: utf-8 -*-
from typing import Any, Dict

from chaoslib.exceptions import FailedActivity
from chaoslib.types import Configuration, Secrets
from requests.status_codes import codes

from chaosmlite import api

__all__ = ["chaosmonkey_enabled"]


def chaosmonkey_enabled(base_url: str,
                        headers: Dict[str, Any] = None,
                        timeout: float = None,
                        configuration: Configuration = None,
                        secrets: Secrets = None) -> bool: 
    """
    Enquire whether Chaos Monkey is enabled on the
    specified service.
    """

    response = api.call_api(base_url=base_url,
                            api_endpoint="chaosmonkey/status",
                            headers=headers,
                            timeout=timeout,
                            configuration=configuration,
                            secrets=secrets) 

    if response.status_code == codes.ok:
        return True 
    elif response.status_code == codes.service_unavailable:
        return False 
    else:
        raise FailedActivity(
            "ChaosMonkey status enquiry failed: {m}".format(m=response.text))

: Declares the new probe function. The probe returns a Boolean. You can also see how secrets and configuration are made available to the probe function.
: Calls an underlying function that is responsible for constructing and actually calling the Chaos Monkey for Spring Boot API.
: Returns True (i.e., the Chaos Monkey for Spring Boot is enabled), if the call responds with an ok response status code.
: Returns False (i.e., the Chaos Monkey for Spring Boot is not enabled), if the call responds with a service_unavailable response status code.
: Raises a Chaos Toolkit core FailedActivity exception if there is an unexpected response code. FailedActivity does not abort the experiment, but instead is added as a note to the experiment’s findings captured in the journal.json file.

To close the loop on the implementation, let’s take a quick look at the underlying api module that is actually responsible for invoking the Chaos Monkey for Spring Boot API. The following code should be added to an api.py file in the chaosmlite module:

import json
from typing import Dict, Any

import requests
from chaoslib.types import Configuration, Secrets
from requests import Response


def call_api(base_url: str,
             api_endpoint: str,
             method: str = "GET",
             assaults_configuration: Dict[str, Any] = None,
             headers: Dict[str, Any] = None,
             timeout: float = None,
             configuration: Configuration = None,
             secrets: Secrets = None) -> Response:
    """ Common HTTP API call to Chaos Monkey for Spring Boot. Both actions and
    probes call the Chaos Monkey for Spring Boot REST API by using this function.
    :param base_url: Base URL of target application
    :param api_endpoint: Chaos Monkey for Spring Boot actuator endpoint
    :param method: HTTP method, default is 'GET'
    :param headers: Additional headers when calling the Chaos Monkey for
            Spring Boot REST API
    :param assaults_configuration: Assaults the configuration to change the
            Chaos Monkey for Spring Boot setting
    :param timeout: The waiting time before connection timeout
    :param configuration: Provides runtime values to actions and probes in
            key/value format. May contain platform-specific API parameters
    :param secrets: Declares values that need to be passed on to actions or
            probes in a secure manner; for example, the auth token
    :return: Return requests.Response
    """

    url = "{base_url}/{api_endpoint}".format(
        base_url=base_url, api_endpoint=api_endpoint)

    headers = headers or {}
    headers.setdefault("Accept", "application/json")

    params = {}
    if timeout:
        params["timeout"] = timeout

    data = None
    if assaults_configuration:
        data = json.dumps(assaults_configuration)
        headers.update({"Content-Type": "application/json"})

    return requests.request(method=method,
                            url=url,
                            params=params,
                            data=data,
                            headers=headers)

Couldn’t I Have Just Put the Code for the API Directly into My Probe?

You could have put the code in the`api.py call_api function directly into your chaosmonkey_enabled function in probes.py, but that would have made it more difficult to mock out the different concerns being implemented, as well as mixing low-level API manipulation with the function contract that your probe represents.

If this extension were to grow in complexity, you would likely see that the code contained in the call_api function was repeated across many of your probes and actions. Thus, it often makes sense for a Chaos Toolkit extension to implement this separation of concerns so that low-level, reusable API calls are prepared in an api.py module, while probes.py contains the actual probe functions that will be called from the experiments.

If you run pytest now, you should see the following:

(chaostk) $ pytest
Test session starts (platform: darwin, Python 3.6.4, pytest 3.3.0,
pytest-sugar 0.9.0)
cachedir: .cache
rootdir: /Users/russellmiles/chaostoolkit-chaosmonkeylite, inifile: pytest.ini
plugins: sugar-0.9.0, cov-2.5.1

 tests/test_probes.py::test_chaosmonkey_is_enabled ✓
 100% ██████████
 generated xml file: /Users/russellmiles/chaostoolkit-chaosmonkeylite/
 junit-test-results.xml

---------- coverage: platform darwin, python 3.6.4-final-0 -----------
Name                   Stmts   Miss  Cover   Missing
----------------------------------------------------
chaosmlite/api.py         17     11    35%   35-50
chaosmlite/probes.py      13      3    77%   32-35
----------------------------------------------------
TOTAL                     32     14    56%

1 file skipped due to complete coverage.
Coverage XML written to file coverage.xml


Results (0.17s):
       1 passed

You can now call your new Python probe function from your experiments:

{
    "type": "probe",
    "name": "is_chaos_enabled_in_the_monkey",
    "provider": {
        "type": "python",
        "module": "chaosmlite.probes", 
        "func": "chaosmonkey_enabled", 
        "arguments": {
            "base_url": "http://localhost:8080/activator"
        } 
    }
}

: Specifies that you want to use your new extension module’s probes.
: The name of your probe function to call.
: Any arguments required by your new probe.

And that’s it: you have a complete Chaos Toolkit extension module ready to be used in your experiments. Now that you’ve created this Chaos Toolkit extension module, you could simply share the code for your new module with your colleagues, and they could use it and improve on it by following the same steps to set it up for development usage. Alternatively, you could build and distribute the module using a system such as PyPi so that others can use it with a single pip command.

Adding a Module to the Chaos Toolkit Incubator Project

You might also consider whether the world would benefit from your new module! If so, the Chaos Toolkit community would love to hear your proposal to add your module to the Chaos Toolkit Incubator project. There, it would join a host of others that are being worked on by teams all around the globe.⁷

Summary

In this chapter we’ve taken a deep dive into how to extend the Chaos Toolkit to support integrations with other systems through the concept of a driver. You’ve seen how to write probes and actions using simple HTTP calls, or calls to local processes, and how to implement more complex interactions using a custom Python Chaos Toolkit module.

In the next chapter, you’ll take the jump into operational concerns that your automated chaos engineering experiments need to participate in.

1 There is a third extension point called a plug-in. Plug-ins extend the functionality of the Chaos Toolkit’s CLI itself, often adding new sub-commands or overriding existing sub-commands of the chaos command. Creating your own plug-in is beyond the scope of this book, but if you are interested in taking advantage of this customization option, then a good example is the reporting plug-in (see “Creating and Sharing Human-Readable Chaos Experiment Reports”).

² See Chapter 10.

³ See Chapter 11.

⁴ Perhaps calling something as simple and powerful as your own shell script, or even something more complicated, such as your own compiled program!

⁵ This is why the Chaos Monkey for Spring Boot is referred to as an application-level chaos tool. It is implemented at the application level as opposed to the infrastructure or platform level, to support turbulent condition injection there. Application-level chaos-inducing tools are particularly useful when it is difficult to inject turbulent conditions into the other levels, such as when working with a serverless platform like AWS Lambda.

⁶ For sharp-eyed readers, a complete Chaos Toolkit driver for the Chaos Monkey for Spring Boot is already available in the Chaos Toolkit incubator. The following sections show how you could create your own driver, and give you all the steps you need to create a driver for your own integrations.

⁷ Just don’t propose the Chaos Monkey for Spring Boot driver you’ve been working on in this chapter; that already exists!