Cryptography is a necessary component in many parts of secure architecture. However, just adding cryptography to the code does not make it more secure; care must be given to such topics as secrets generation, secrets storage, and plain-text management. Properly designing secure software is a complicated matter, more so when cryptography is involved.
Designing for security is beyond our scope here: this chapter only teaches the basic tools that Python has for cryptography, and how to use them.
8.1 Fernet
The cryptography module supports the fernet cryptography standard. It is named after an Italian, not French, wine: the “t” is pronounced. A good approximation for the pronunciation is like “fair-net.”
fernet works for symmetric cryptography . It does not support partial or streaming decryption: it expects to read in the whole ciphertext and to return the whole plain text. This makes it suitable for names, text documents, or even pictures. However, videos and disk images are a poor fit for Fernet.
The cryptographic parameters of Fernet were chosen by domain experts, who researched available encryption methods, as well as the known, best attacks against them. One advantage in using Fernet is that it avoids the need for you to become an expert yourself. However, for completeness, we note that the Fernet standard uses AES-128 in CBC padding with PKCS7, and HMAC using SHA256 for authentication.
The key is a short string of bytes. Managing the key securely is important: cryptography is only as good as its keys. If it is kept in a file, for example, the file should have minimal permissions and ideally be hosted on an encrypted file system.
Decryption is done in the same way as encryption. Fernet does contain a version marker, so if vulnerabilities in these are found, it is possible to move the standard to a different encryption and hashing system.
This will fail if the encrypted information (sometimes referred to as the “token”) is older than five seconds. This is useful to prevent replay attacks: one where a previous encrypted token was captured and replayed instead of a new valid token. For example, if the encrypted token has a list of usernames that are allowed some access, and is retrieved using a subvertible medium, a user who is no longer allowed in can substitute the older token.
Ensuring token freshness would mean that no such list would be decoded, and everybody would be denied – which is no worse than if the medium was tampered with without having a token that was previously valid.
This can also be used to ensure good secret rotation hygiene. By refusing to decrypt anything older than, say, a week, we make sure that if the secret rotation infrastructure broke, we would fail loudly instead of succeeding silently, and thus fix it.
In order to support seamless key rotation, the Fernet module also has a MultiFernet class. MultiFernet takes a list of secrets. It encrypts with the first secret but will try decrypting with any secret.
This means that if we add a new key to the end, first, it will not be used for encryption. After the addition to the end is synchronized, we can remove the first key. Now all encryptions will be done via the second key; and even those instances where it is not synchronized yet will have the decryption key available.
This two-step process is designed to have zero “invalid decryption” errors while still allowing key rotation, which is important as a precautionary measure – and a well-tested rotation procedure means that if keys are leaked, the rotation procedure can minimize the harm they do.
8.2 PyNaCl
PyNaCl is a library wrapping the libsodium C library. libsodium is a fork of Daniel J. Bernstein’s libnacl, which is why PyNaCl is named that way. (NaCl, or Sodium Chloride, is the chemical formula for salt. The fork took the name of the first element.)
PyNaCl supports both symmetric and asymmetric encryption. However, since cryptography supports symmetric encryption with Fernet, the main use of PyNaCl is for asymmetric encryption.
The idea of asymmetric encryption is that there is a private and a public key. The public key can easily be calculated from the private key but not vice versa; that is, the “asymmetry” it refers to. The public key is published, while the private key must remain a secret.
There are, in general, two basic operations supported with public-key cryptography. We can encrypt with the public key, in a way that can only be decrypted with the private key. We can also sign with the private key, in a way that can be verified with the public key.
As we have discussed earlier, modern cryptographic practice places as much value on authentication as it does on secrecy. This is because if the media the secret is transmitted on is vulnerable to eavesdropping, it is often vulnerable to modification. Secret modification attacks have had enough impact on the field that a cryptographic system is not considered complete if it does not guarantee both authenticity and secrecy.
Because of that, libsodium, and by extension PyNaCl, do not support encryption without signing, or decryption without signature verification.
We can generate a private key from the byte stream, and it will be identical. This means we can again keep the private key in a way we decide is secure enough: a secret manager, for example.
This one signs using the source private key and encrypts using the target’s public key.
The decryption box decrypts with target private key and verifies the signature using source’s public key. If the information has been tampered with, the decryption operation automatically fails. This means that it is impossible to access plain-text information that is not correctly signed.
Another piece functionality that is useful inside of PyNaCl is cryptographic signing. It is sometimes useful to sign without encryption: for example, we can make sure to only use approved binary files by signing them. This allows the permissions for storing the binary file to be loose, as long as we trust that the permissions on keeping the signing key secure are strong enough.
Signing also involves asymmetric cryptography. The private key is used to sign, and the public key is used to verify the signatures. This means that we can, for example, check the public key into source control, and avoid needing any further configuration of the verification part.
This is useful in case we want to save the signature in a separate place. For example, if the original is in an object storage, mutating it might be undesirable for various reasons. In those cases, we can keep the signatures “on the side.” Another reason is to maintain different signatures for different purposes, or to allow key rotation.
8.3 Passlib
Secure storage of passwords is a delicate matter. The biggest reason it is so subtle is that it has to deal with people who do not use password best practices. If all passwords were strong, and people never reused passwords from site to site, password storage would be straightforward.
However, people usually choose passwords with little entropy (123456 is still unreasonably popular, as well as password), they have a “standard password” that they use for all websites, and they are often vulnerable to phishing attacks and social engineering attacks where they divulge the password to an unauthorized third party.
Not all of these threats can be stopped by correctly storing passwords, but many of them can, at least, be mitigated and weakened.
The passlib library is written by people who are well versed in software security, and tries to, at least, eliminate the most obvious mistakes when saving passwords. Passwords are never saved in plain text – always hashed.
Note that hashing algorithms for passwords are optimized for different use cases than hashing algorithms used for other reasons: for example, one of the things they try to deny is brute-force source mapping attacks.
Passlib hashes passwords with the latest vetted algorithms optimized for password storage, and they intended to avoid any possibility of side-channel attacks. In addition, “salt” is always used for hashing the passwords.
Although passlib can be used without understanding these things, it is worthwhile to understand them in order to avoid mistakes while using passlib.
Hashing means taking the users’ passwords and running it through a function that is reasonably easy to compute but hard to invert. This means that even if an attacker gets access to the password database, they cannot recover users’ passwords and pretend to be them.
One way that the attacker can try to get the original passwords is to try all combinations of passwords they can come up with, hash them, and see if they are equal to a password. In order to avoid this, special algorithms are used that are computationally hard. This means that an attacker would have to use a lot of resources in order to try many passwords, so that even if, say, only a few million passwords are tried, it would take a long time to compare. Lastly, attackers can use something called “rainbow tables” to precompute many hashes of common passwords, and compare them all at once against a password database. In order to avoid that, passwords are “salted” before they are hashed: a random prefix (the “salt”) is added, the password is hashed, and the salt is prefixed to the hash value. When the password is received from the user, the salt is retrieved from the beginning of the hash value, before hashing it to compare.
Doing all of this from scratch is hard and even harder to get it right. Getting it “right” does not just mean having users log in, but being resilient to the password database being stolen. Since there is no feedback about that aspect, it is best to use a well-tested library.
The library is storage agnostic: it does not care where the passwords are being stored. However, it does care that it is possible to update the hashed passwords. This way, hashed passwords can get updated to newer hashing schemes as the need arises. While passlib does support various low-level interfaces, it is best to use the high-level interface of the CryptContext. The name is misleading, since it does no encryption; it is a reference to vaguely similar (and largely deprecated) functionality built into Unix.
It is possible to configure other details, such as the number of rounds. This is almost always unnecessary, as the defaults should be good enough.
When saving the string, note that it does contain newlines; this might impact where it can be saved. If needed, it is always possible to convert it to base64.
In that case, we would need to store the second element in our password hash storage.
8.4 TLS Certificates
Transport Layer Security (TLS) is a cryptographic way to protect data in transit. Since one potential attack is man-in-the-middle, it is important to be able to verify that the endpoints are correct. For this reason, the public keys are signed by Certificate Authorities. Sometimes, it is useful to have a local certificate authority.
One case where that can be useful is in microservice architectures, where verifying each service is the right one allows a more secure installation. Another case where that is useful is for putting together an internal test environment, where using real certificate authorities is sometimes not worth the effort; it is easy enough to install the local certificate authority as locally trusted and sign the relevant certificates with it.
Another place that this can be useful is in running tests. When running integration tests, we would like to set up a realistic integration environment. Ideally, some of these tests would check that; indeed, TLS is used rather than plain text. This is impossible to test if, for purposes of testing, we downgrade to plain-text communication. Indeed, the root cause of many production security breaches is that the code, inserted for testing, to enable plain-text communication, was accidentally enabled (or possible to maliciously enable) in production; and furthermore, it was impossible to test that such bugs did not exist, because the testing environment did have plain-text communication.
For the same reason, allowing TLS connections without verification in the testing environment is dangerous. This means that the code has a non-verification flow, which can accidentally turn on, or maliciously be turned on, in production, and is impossible to prevent with testing.
Creating a certificate manually requires access to the hazmat layer in cryptography. This is so named because this is dangerous; we have to judiciously choose encryption algorithms and parameters, and the wrong choices can lead to insecure modes.
This is important, since the certificate only refers to the public key. Since the private key is never shared, it is not worthwhile, and actively dangerous, to make any assertions about it.
This is it! We now have a private key, and a self-signed certificate that claims to be a CA. However, we will need to store them in files.
This gives us the capability to now be a CA.
In general, for real certificate authorities, we need to generate a Certificate Signing Request (CSR) in order to prove that the owner of the private key actually wants that certificate. However, since we are the certificate authority, we can just create the certificate directly.
The service.pem file is in a format that can be used by most popular web servers: Apache, Nginx, HAProxy, and many more. It can also be used directly by the Twisted web server, by using the txsni extension.
If we add the ca.crt file to the trust root, and run, say, an Nginx server, on an IP that our client would resolve from service.test.local, then when we connect clients to https://service.test.local , they will verify that the certificate is indeed valid.
8.5 Summary
Cryptography is a powerful tool but one that is easy to misuse. By using well-understood high-level functions, we reduce many of the risks in using cryptography. While this does not substitute proper risk analysis and modeling, it does make this exercise somewhat easier.
Python has several third-party libraries with well-vetted code, and it is a good idea to use them.