No two systems are created the same, and there are as many failure conditions out there as there are system implementation choices. For example, you might choose any of the following options:
Run on a virtual machine
Run on dedicated hardware in your own data center
Use a virtual network
Use a hardwired network
Use AWS, Azure, Google Cloud…(pick your own cloud infrastructure as a service provider!)
The list is almost endless, and that’s just at the infrastructure level! When you consider the many options available at other levels, such as the platform and application levels, you face a combinatorial explosion of options that would likely stymie any thoughts of a common tool for automated chaos experiments.
The key here is that none of these choices are wrong; they are just different, and your unique context will be different, too. Not everyone is running Docker containers in Kubernetes, or serverless event-driven functions—and even if they were, there’s still sufficient variation among the leading cloud providers to make adapting to each environment a challenge.
Then there are the choices that are special to your context. Even though many infrastructure, platform, and even application implementation decisions are being standardized and commoditized, it’s still likely you have something different from others in the industry. Maybe you have a legacy COBOL application, or perhaps you’ve forked and amended an open source framework just enough for it to be quite different from anything else out there. Whatever your situation, you just might have something special that also needs to be interacted with when you run your chaos experiments.
Fortunately, the Chaos Toolkit has the capability for customization built into it.
In Chapter 7 you learned that there are two main ways to extend the Chaos Toolkit to meet your needs:1
Custom drivers that include your own implementations of probes and actions to support your experiments
Custom controls that provide ways to integrate your toolkit with operational concerns such as observability,2 or even human intervention3
A large and still-growing collection of open source extensions available in the Chaos Toolkit and the Chaos Toolkit Incubator implement some or all of these different extension points. You should always check those resources before creating your own custom extension, but it is quite common to extend your toolkit for your own context’s needs.
Let’s look at how you can create a custom driver to create custom actions and probes that can then be called from the steady-state-hypothesis
, method
, or rollbacks
section of your experiment.
You can integrate your experiments with your own systems without any additional code by using either of the following options from your experiment’s probes and actions:
You can call an HTTP endpoint from your experiment’s probes and actions.
You can call a local process from your experiment’s probes and actions.
You’ve seen how to call an HTTP endpoint from an experiment’s probe before (see “Exploring and Discovering Evidence of Weaknesses”). In a basic case in which you simply want to invoke an HTTP endpoint using a GET
request, you first specify that the type
of your probe’s provider is http
:
{
"type"
:
"probe"
,
"name"
:
"application-must-respond-normally"
,
"tolerance"
:
200
,
"provider"
:
{
"type"
:
"http"
,
}
}
Then you provide the URL that is to be invoked:
{
"type"
:
"probe"
,
"name"
:
"simple-http-call"
,
"provider"
:
{
"type"
:
"http"
,
"url"
:
"http://somehost"
}
}
But what if you want to do more than a simple GET
HTTP request? Not a problem—the http
provider allows you to specify the following:
The HTTP method to use; i.e., GET
(the default), POST
, DELETE
, etc.
A list of keys and values to be passed as headers in the HTTP request
A list of keys and values to be passed as the request’s payload
There’s a problem here, though. By default, an HTTP probe like the one you’ve specified here will hang indefinitely if there is a problem reaching or serving from the specified URL endpoint. To take control over this, you can specify a timeout
, in seconds, on your http
provider:
{
"type"
:
"probe"
,
"name"
:
"simple-http-call"
,
"provider"
:
{
"type"
:
"http"
,
"url"
:
"http://somehost"
,
"timeout"
:
3
}
}
The return value from an HTTP probe is a collection of the HTTP status, the response headers, and the contents of the HTTP response. If your probe was declared inside of your experiment’s steady-state hypothesis
block, then you can examine the status code to see if it is one of a collection as the probe’s tolerance
:
"steady-state-hypothesis"
:
{
"title"
:
"Services are all available and healthy"
,
"probes"
:
[
{
"type"
:
"probe"
,
"name"
:
"application-must-respond-normally"
,
"tolerance"
:
200
,
"provider"
:
{
"type"
:
"http"
,
"url"
:
"http://192.168.99.100:32638/invokeConsumedService"
,
"timeout"
:
3
}
}
It is possible that you might want to examine a lot more than just the HTTP response status when using an http
provider–implemented probe in your steady-state hypothesis. Maybe you’d like to examine the contents of the response’s body for specific information, or even examine the returning headers for something important to your judgment that the system is in a recognizably normal state. Unfortunately, the http
provider doesn’t allow such advanced introspection; if you need such power, it’s worth switching to a process
provider such as a bash script,4 as described next. Alternatively, if much more programmatic control is needed, then you might consider implementing your own custom driver in Python (see “Creating Your Own Custom Chaos Driver in Python”).
Similar to the http
provider, there is a process
provider in the Chaos Toolkit that allows your experiment to call any arbitrary local process as part of the experiment’s execution. This time you’ll create an action
activity that uses the process
provider:
{
"type"
:
"action"
,
"name"
:
"rollout-application-update"
,
"provider"
:
{
"type"
:
"process"
}
You specify the local process you want to invoke using path
, and you can specify any arguments to be passed to the process by populating arguments
:
{
"type"
:
"action"
,
"name"
:
"rollout-application-update"
,
"provider"
:
{
"type"
:
"process"
"path"
:
"echo"
,
"arguments"
:
"'updated'"
}
The process
provider is a powerful way of integrating your experiments with anything that can be called locally. It is especially useful when you already have an existing set of tools that probe and act on your system, and you simply want to add the higher-level abstraction of the experiment itself on top of their functionality.
Sometimes you just need the power of a full programming language and supporting ecosystem of libraries when creating your own custom probes and actions. This is when it’s time to consider implementing your own Python extension to the Chaos Toolkit.
The first step is to make sure you are in the Python virtual environment (or global environment, if that’s your preference) where your instance of the Chaos Toolkit is installed:
(chaostk) $ chaos --help ...
Here, you’re using the chaos
command provided by the chaostk
virtual environment you created earlier (see Chapter 4).
For our current purposes, you’re going to create a custom Python driver that begins to integrate with the Chaos Monkey for Spring Boot, another chaos engineering tool that allows you to inject all sorts of turbulent conditions into running Spring Boot applications.5
Chaos drivers in the Chaos Toolkit were designed for the specific purpose of being able to integrate not only with target systems, but also with third-party chaos engineering tools. This is why there are now drivers for systems such as Toxiproxy and Gremlin. Both are very capable chaos-inducing systems in their own right, and the Chaos Toolkit makes it possible for you to add the experiment definitions to each tool, as well as to then pick the best of each for your own chaos engineering needs.
The requirement for your new Chaos Monkey for Spring Boot driver is to be able to provide a probe that can respond regardless of whether the Chaos Monkey for Spring Boot is enabled on a particular Spring Boot application instance.6
The first step is to create a new Python module project to house your new extension code. You could do this manually, but there’s also a Python Cookiecutter template available that can get you set up with the necessary boilerplate code for your new module.
To install the Cookiecutter tool, enter the following, taking care to confirm that you have activated your Python virtual environment for the Chaos Toolkit:
(chaostk) $ pip install cookiecutter
Now you can create a new module called chaosmonkeylite
with the following command, filling in the information for the various prompts as you go:
(chaostk) $ cookiecutter https://github.com/dastergon/cookiecutter-chaostoolkit. git full_name [chaostoolkit Team]: your_name email [contact@chaostoolkit.org]: your_email project_name [chaostoolkit-boilerplate]: chaostoolkit-chaosmonkeylite project_slug [chaostoolkit_chaosmonkeylite]: chaosmlite project_short_description [Chaos Toolkit Extension for X.]: Chaos Toolkit Extension for the Chaos Monkey for Spring Boot version [0.1.0]: 0.1.0
If everything has gone well you can now list the contents of your current directory and see the contents of your new chaostoolkit-chaosmonkeylite project:
(chaostk) $ tree chaostoolkit-chaosmonkeylite chaostoolkit-chaosmonkeylite ├── CHANGELOG.md ├── LICENSE ├── README.md ├── chaosmlite │ └── __init__.py ├── ci.bash ├── pytest.ini ├── requirements-dev.txt ├── requirements.txt ├── setup.cfg ├── setup.py └── tests └── __init__.py
All good so far! Now you should be able to change into the chaostoolkit-chaosmonkeylite directory and install your new, empty extension module, which will be ready for development and testing:
(chaostk) $ cd chaostoolkit-chaosmonkey (chaostk) $ pip install -r requirements-dev.txt -r requirements.txt ... (chaostk) $ pip install -e . ... (chaostk) $ pytest Test session starts (platform: darwin, Python 3.6.4, pytest 3.3.0, pytest-sugar 0.9.0) cachedir: .cache rootdir: /Users/russellmiles/chaostoolkit-chaosmonkeylite, inifile: pytest.ini plugins: sugar-0.9.0, cov-2.5.1 Coverage.py warning: No data was collected. (no-data-collected) generated xml file: /Users/russellmiles/chaostoolkit-chaosmonkeylite/junit- test-results.xml ---------- coverage: platform darwin, python 3.6.4-final-0 ----------- Name Stmts Miss Cover Missing ------------------------------------------------------ chaosmlite/__init__.py 2 2 0% 3-5 Coverage XML written to file coverage.xml Results (0.02s):
If there is already a directory called chaosmonkeylite wherever you ran the cookiecutter
command, then you will get a conflict, as the Cookiecutter tool expects to create a new directory and populate it from empty.
Now that you have a Python module project all set up and reachable from your installation of the Chaos Toolkit, it’s time to add some features. The first feature required is to provide a probe that can respond regardless of whether the Chaos Monkey for Spring Boot is enabled on a particular Spring Boot application instance.
Practicing test-driven development, you can create the following test for this new probe in your tests
module, in a file named test_probes.py:
# -*- coding: utf-8 -*-
from
unittest
import
mock
from
unittest.mock
import
MagicMock
import
pytest
from
requests
import
Response
,
codes
from
chaosmlite.probes
import
chaosmonkey_enabled
def
test_chaosmonkey_is_enabled
(
)
:
mock_response
=
MagicMock
(
Response
,
status_code
=
codes
.
ok
)
actuator_endpoint
=
"
http://localhost:8080/actuator
"
with
mock
.
patch
(
'
chaosmlite.api.call_api
'
,
return_value
=
mock_response
)
as
mock_call_api
:
enabled
=
chaosmonkey_enabled
(
base_url
=
actuator_endpoint
)
assert
enabled
mock_call_api
.
assert_called_once_with
(
base_url
=
actuator_endpoint
,
api_endpoint
=
"
chaosmonkey/status
"
,
headers
=
None
,
timeout
=
None
,
configuration
=
None
,
secrets
=
None
)
Imports the chaosmonkey_enabled
function from the chaosmlite
module.
Mocks out the expected response from the Chaos Monkey for Spring Boot API.
While mocking out the call to the chaosmlite.api.call_api
, returning an expected response for the Chaos Monkey for Spring Boot being enabled, calls the chaosmonkey_enabled
probe function.
Asserts that the response correctly identifies that the Chaos Monkey for Spring Boot is enabled.
Now if you run the pytest
command, you should see the following failure:
(chaostk) $ pytest ... E ModuleNotFoundError: No module named 'chaosmlite.probes' ...
That seems fair—the test is trying to use probes that you haven’t even written yet! To do that now, add the following code into a probes.py file in the chaosmlite module directory:
# -*- coding: utf-8 -*-
from
typing
import
Any
,
Dict
from
chaoslib.exceptions
import
FailedActivity
from
chaoslib.types
import
Configuration
,
Secrets
from
requests.status_codes
import
codes
from
chaosmlite
import
api
__all__
=
[
"
chaosmonkey_enabled
"
]
def
chaosmonkey_enabled
(
base_url
:
str
,
headers
:
Dict
[
str
,
Any
]
=
None
,
timeout
:
float
=
None
,
configuration
:
Configuration
=
None
,
secrets
:
Secrets
=
None
)
-
>
bool
:
""" Enquire whether Chaos Monkey is enabled on the specified service. """
response
=
api
.
call_api
(
base_url
=
base_url
,
api_endpoint
=
"
chaosmonkey/status
"
,
headers
=
headers
,
timeout
=
timeout
,
configuration
=
configuration
,
secrets
=
secrets
)
if
response
.
status_code
==
codes
.
ok
:
return
True
elif
response
.
status_code
==
codes
.
service_unavailable
:
return
False
else
:
raise
FailedActivity
(
"
ChaosMonkey status enquiry failed: {m}
"
.
format
(
m
=
response
.
text
)
)
Declares the new probe function. The probe returns a Boolean. You can also see how secrets and configuration are made available to the probe function.
Calls an underlying function that is responsible for constructing and actually calling the Chaos Monkey for Spring Boot API.
Returns True
(i.e., the Chaos Monkey for Spring Boot is enabled), if the call responds with an ok
response status code.
Returns False
(i.e., the Chaos Monkey for Spring Boot is not enabled), if the call responds with a service_unavailable
response status code.
Raises a Chaos Toolkit core FailedActivity
exception if there is an unexpected response code. FailedActivity
does not abort the experiment, but instead is added as a note to the experiment’s findings captured in the journal.json file.
To close the loop on the implementation, let’s take a quick look at the underlying api
module that is actually responsible for invoking the Chaos Monkey for Spring Boot API. The following code should be added to an api.py file in the chaosmlite
module:
import
json
from
typing
import
Dict
,
Any
import
requests
from
chaoslib.types
import
Configuration
,
Secrets
from
requests
import
Response
def
call_api
(
base_url
:
str
,
api_endpoint
:
str
,
method
:
str
=
"GET"
,
assaults_configuration
:
Dict
[
str
,
Any
]
=
None
,
headers
:
Dict
[
str
,
Any
]
=
None
,
timeout
:
float
=
None
,
configuration
:
Configuration
=
None
,
secrets
:
Secrets
=
None
)
->
Response
:
""" Common HTTP API call to Chaos Monkey for Spring Boot. Both actions and
probes call the Chaos Monkey for Spring Boot REST API by using this function.
:param base_url: Base URL of target application
:param api_endpoint: Chaos Monkey for Spring Boot actuator endpoint
:param method: HTTP method, default is 'GET'
:param headers: Additional headers when calling the Chaos Monkey for
Spring Boot REST API
:param assaults_configuration: Assaults the configuration to change the
Chaos Monkey for Spring Boot setting
:param timeout: The waiting time before connection timeout
:param configuration: Provides runtime values to actions and probes in
key/value format. May contain platform-specific API parameters
:param secrets: Declares values that need to be passed on to actions or
probes in a secure manner; for example, the auth token
:return: Return requests.Response
"""
url
=
"{base_url}/{api_endpoint}"
.
format
(
base_url
=
base_url
,
api_endpoint
=
api_endpoint
)
headers
=
headers
or
{}
headers
.
setdefault
(
"Accept"
,
"application/json"
)
params
=
{}
if
timeout
:
params
[
"timeout"
]
=
timeout
data
=
None
if
assaults_configuration
:
data
=
json
.
dumps
(
assaults_configuration
)
headers
.
update
({
"Content-Type"
:
"application/json"
})
return
requests
.
request
(
method
=
method
,
url
=
url
,
params
=
params
,
data
=
data
,
headers
=
headers
)
If you run pytest
now, you should see the following:
(chaostk) $ pytest Test session starts (platform: darwin, Python 3.6.4, pytest 3.3.0, pytest-sugar 0.9.0) cachedir: .cache rootdir: /Users/russellmiles/chaostoolkit-chaosmonkeylite, inifile: pytest.ini plugins: sugar-0.9.0, cov-2.5.1 tests/test_probes.py::test_chaosmonkey_is_enabled ✓ 100% ██████████ generated xml file: /Users/russellmiles/chaostoolkit-chaosmonkeylite/ junit-test-results.xml ---------- coverage: platform darwin, python 3.6.4-final-0 ----------- Name Stmts Miss Cover Missing ---------------------------------------------------- chaosmlite/api.py 17 11 35% 35-50 chaosmlite/probes.py 13 3 77% 32-35 ---------------------------------------------------- TOTAL 32 14 56% 1 file skipped due to complete coverage. Coverage XML written to file coverage.xml Results (0.17s): 1 passed
You can now call your new Python probe function from your experiments:
{
"type"
:
"probe"
,
"name"
:
"is_chaos_enabled_in_the_monkey"
,
"provider"
:
{
"type"
:
"python"
,
"module"
:
"chaosmlite.probes"
,
"func"
:
"chaosmonkey_enabled"
,
"arguments"
:
{
"base_url"
:
"http://localhost:8080/activator"
}
}
}
Specifies that you want to use your new extension module’s probes.
The name of your probe function to call.
Any arguments required by your new probe.
And that’s it: you have a complete Chaos Toolkit extension module ready to be used in your experiments. Now that you’ve created this Chaos Toolkit extension module, you could simply share the code for your new module with your colleagues, and they could use it and improve on it by following the same steps to set it up for development usage. Alternatively, you could build and distribute the module using a system such as PyPi so that others can use it with a single pip
command.
You might also consider whether the world would benefit from your new module! If so, the Chaos Toolkit community would love to hear your proposal to add your module to the Chaos Toolkit Incubator project. There, it would join a host of others that are being worked on by teams all around the globe.7
In this chapter we’ve taken a deep dive into how to extend the Chaos Toolkit to support integrations with other systems through the concept of a driver. You’ve seen how to write probes and actions using simple HTTP calls, or calls to local processes, and how to implement more complex interactions using a custom Python Chaos Toolkit module.
In the next chapter, you’ll take the jump into operational concerns that your automated chaos engineering experiments need to participate in.
1 There is a third extension point called a plug-in. Plug-ins extend the functionality of the Chaos Toolkit’s CLI itself, often adding new sub-commands or overriding existing sub-commands of the chaos
command. Creating your own plug-in is beyond the scope of this book, but if you are interested in taking advantage of this customization option, then a good example is the reporting plug-in (see “Creating and Sharing Human-Readable Chaos Experiment Reports”).
2 See Chapter 10.
3 See Chapter 11.
4 Perhaps calling something as simple and powerful as your own shell script, or even something more complicated, such as your own compiled program!
5 This is why the Chaos Monkey for Spring Boot is referred to as an application-level chaos tool. It is implemented at the application level as opposed to the infrastructure or platform level, to support turbulent condition injection there. Application-level chaos-inducing tools are particularly useful when it is difficult to inject turbulent conditions into the other levels, such as when working with a serverless platform like AWS Lambda.
6 For sharp-eyed readers, a complete Chaos Toolkit driver for the Chaos Monkey for Spring Boot is already available in the Chaos Toolkit incubator. The following sections show how you could create your own driver, and give you all the steps you need to create a driver for your own integrations.
7 Just don’t propose the Chaos Monkey for Spring Boot driver you’ve been working on in this chapter; that already exists!