Cryptography

Cryptography is a necessary component in many parts of secure architecture. However, just adding cryptography to the code does not make it more secure; care must be given to such topics as secrets generation, secrets storage, and plain-text management. Properly designing secure software is a complicated matter, more so when cryptography is involved.

Designing for security is beyond our scope here: this chapter only teaches the basic tools that Python has for cryptography, and how to use them.

8.1 Fernet

The cryptography module supports the fernet cryptography standard. It is named after an Italian, not French, wine: the “t” is pronounced. A good approximation for the pronunciation is like “fair-net.”

fernet works for symmetric cryptography . It does not support partial or streaming decryption: it expects to read in the whole ciphertext and to return the whole plain text. This makes it suitable for names, text documents, or even pictures. However, videos and disk images are a poor fit for Fernet.

The cryptographic parameters of Fernet were chosen by domain experts, who researched available encryption methods, as well as the known, best attacks against them. One advantage in using Fernet is that it avoids the need for you to become an expert yourself. However, for completeness, we note that the Fernet standard uses AES-128 in CBC padding with PKCS7, and HMAC using SHA256 for authentication.

The Fernet standard is also supported by Go, Ruby, and Erlang and so is sometimes suitable for data exchange with other languages. It was especially designed so that using it insecurely is harder than using it correctly.

>>> k = fernet.Fernet.generate_key()

>>> type(k)

The key is a short string of bytes. Managing the key securely is important: cryptography is only as good as its keys. If it is kept in a file, for example, the file should have minimal permissions and ideally be hosted on an encrypted file system.

The generate_key class method takes care to generate the key securely, using an operating-system level source of random bytes. However, it is still vulnerable to operating-system level flaws: for example, when cloning virtual machines, care must be taken that when starting the clone, it refreshes the source of randomness. This is admittedly an esoteric case, and whatever virtualization system is being used should have documentation on how to refresh the randomness source in its virtual machines.

>>> frn = fernet.Fernet(k)

The fernet class is initialized with a key. It will make sure that the key is valid.

>>> encrypted = frn.encrypt(b"x marks the spot")

>>> encrypted[:10]

b'gAAAAABb1'

Encryption is simple. It takes a string of bytes and returns an encrypted string. Note that the encrypted string is longer than the source string. The reason is that it is also signed with the secret key. This means that tampering with the encrypted string is detectable, and the Fernet API handles that by refusing to decrypt the string. This means that the value gotten back from decryption is trustworthy; it was indeed encrypted by someone who had access to the secret key.

>>> frn.decrypt(encrypted)

b'x marks the spot'

Decryption is done in the same way as encryption. Fernet does contain a version marker, so if vulnerabilities in these are found, it is possible to move the standard to a different encryption and hashing system.

Fernet encryption always adds the current date to the signed, encrypted information. Because of this, it is possible to limit the age of a message before decrypting.

>>> frn.decrypt(encrypted, ttl=5)

This will fail if the encrypted information (sometimes referred to as the “token”) is older than five seconds. This is useful to prevent replay attacks: one where a previous encrypted token was captured and replayed instead of a new valid token. For example, if the encrypted token has a list of usernames that are allowed some access, and is retrieved using a subvertible medium, a user who is no longer allowed in can substitute the older token.

Ensuring token freshness would mean that no such list would be decoded, and everybody would be denied – which is no worse than if the medium was tampered with without having a token that was previously valid.

This can also be used to ensure good secret rotation hygiene. By refusing to decrypt anything older than, say, a week, we make sure that if the secret rotation infrastructure broke, we would fail loudly instead of succeeding silently, and thus fix it.

In order to support seamless key rotation, the Fernet module also has a MultiFernet class. MultiFernet takes a list of secrets. It encrypts with the first secret but will try decrypting with any secret.

This means that if we add a new key to the end, first, it will not be used for encryption. After the addition to the end is synchronized, we can remove the first key. Now all encryptions will be done via the second key; and even those instances where it is not synchronized yet will have the decryption key available.

This two-step process is designed to have zero “invalid decryption” errors while still allowing key rotation, which is important as a precautionary measure – and a well-tested rotation procedure means that if keys are leaked, the rotation procedure can minimize the harm they do.

8.2 PyNaCl

PyNaCl is a library wrapping the libsodium C library. libsodium is a fork of Daniel J. Bernstein’s libnacl, which is why PyNaCl is named that way. (NaCl, or Sodium Chloride, is the chemical formula for salt. The fork took the name of the first element.)

PyNaCl supports both symmetric and asymmetric encryption. However, since cryptography supports symmetric encryption with Fernet, the main use of PyNaCl is for asymmetric encryption.

The idea of asymmetric encryption is that there is a private and a public key. The public key can easily be calculated from the private key but not vice versa; that is, the “asymmetry” it refers to. The public key is published, while the private key must remain a secret.

There are, in general, two basic operations supported with public-key cryptography. We can encrypt with the public key, in a way that can only be decrypted with the private key. We can also sign with the private key, in a way that can be verified with the public key.

As we have discussed earlier, modern cryptographic practice places as much value on authentication as it does on secrecy. This is because if the media the secret is transmitted on is vulnerable to eavesdropping, it is often vulnerable to modification. Secret modification attacks have had enough impact on the field that a cryptographic system is not considered complete if it does not guarantee both authenticity and secrecy.

Because of that, libsodium, and by extension PyNaCl, do not support encryption without signing, or decryption without signature verification.

In order to generate a private key, we just use the class method:

>>> from nacl.public import PrivateKey

>>> k = PrivateKey.generate()

The type of k is PrivateKey. However, at some point, we will usually want to persist the private key.

>>> type(k.encode())

The encode method encodes the secret key as a stream of bytes.

>>> kk = PrivateKey(k.encode())

>>> kk == k

True

We can generate a private key from the byte stream, and it will be identical. This means we can again keep the private key in a way we decide is secure enough: a secret manager, for example.

In order to encrypt, we need a public key. Public keys can be generated from private keys.

>>> from nacl.public import PublicKey

>>> target = PrivateKey.generate()

>>> public_key = target.public_key

Of course, in a more realistic scenario, public keys need to be stored somewhere: in a file, in a database, or just sent via the network. For that, we need to convert the public key into bytes.

>>> encoded = public_key.encode()

>>> encoded[:4]

b'\xb91>\x95'

When we get the bytes, we can regenerate the public key. It is identical to the original public key.

>>> public_key_2 = PublicKey(key_bytes)

>>> public_key_2 == public_key

True

A PyNaCl Box represents pair of keys: the first private, the second public. The Box signs with the private key, then encrypts with the public key. Every message that we encrypt always gets signed.

>>> from nacl.public import PrivateKey, PublicKey, Box

>>> source = PrivateKey.generate()

>>> with open("target.pubkey", "rb") as fpin:

... target_public_key = PublicKey(fpin.read())

>>> enc_box = Box(source, target_public_key)

>>> result = enc_box.encrypt(b"x marks the spot")

>>> result[:4]

b'\xe2\x1c0\xa4'

This one signs using the source private key and encrypts using the target’s public key.

When we decrypt, we need to build the inverse box. This happens on a different computer: one that has the target private key but only the source’s public key.

>>> from nacl.public import PrivateKey, PublicKey, Box

>>> with open("source.pubkey", "rb") as fpin:

... source_public_key = PublicKey(fpin.read())

>>> with open("target.private_key", "rb") as fpin:

... target = PrivateKey(fpin.read())

>>> dec_box = Box(target, source_public_key)

>>> dec_box.decrypt(result)

b'x marks the spot'

The decryption box decrypts with target private key and verifies the signature using source’s public key. If the information has been tampered with, the decryption operation automatically fails. This means that it is impossible to access plain-text information that is not correctly signed.

Another piece functionality that is useful inside of PyNaCl is cryptographic signing. It is sometimes useful to sign without encryption: for example, we can make sure to only use approved binary files by signing them. This allows the permissions for storing the binary file to be loose, as long as we trust that the permissions on keeping the signing key secure are strong enough.

Signing also involves asymmetric cryptography. The private key is used to sign, and the public key is used to verify the signatures. This means that we can, for example, check the public key into source control, and avoid needing any further configuration of the verification part.

We first have to generate the private signing key. This is similar to generating a key for decryption.

>>> from nacl.signing import SigningKey

>>> key = SigningKey.generate()

We will usually need to store this key (securely) somewhere for repeated use. Again, it is worthwhile remembering that anyone who can access the signing key can sign whatever data they want. For this, we can use encoding:

>>> encoded = key.encode()

>>> type(encoded)

The key can be reconstructed from the encoded version. That produces an identical key.

>>> key_2 = SigningKey(encoded)

>>> key_2 == key

True

For verification, we need to have the verification key. Since this is asymmetric cryptography, the verification key can be calculated from the signing key, but not vice versa.

>>> verify_key = key.verify_key

We will usually need to store the verification key somewhere, so we need to be able to encode it as bytes.

>>> verify_encoded = verify_key.encode()

>>> verify_encoded[:4]

b'\x08\xb1\x9e\xf4'

We can reconstruct the verification key. That gives an identical key. Like all ...Key classes, it supports a constructor that accepts an encoded key and returns a key object.

>>> from nacl.signing import VerifyKey

>>> verify_key_2 = VerifyKey(verify_encoded)

>>> verify_key == verify_key_2

True

When we sign a message, we get an interesting object back:

>>> message = b"The number you shall count is three"

>>> result = key.sign(message)

>>> result

b'\x1a\xd38[....'

It displays as bytes. But it is not bytes:

>>> type(result)

We can extract the message and the signature from it separately:

>>> result.message

b'The number you shall count is three'

>>> result.signature

b'\x1a\xd38[...'

This is useful in case we want to save the signature in a separate place. For example, if the original is in an object storage, mutating it might be undesirable for various reasons. In those cases, we can keep the signatures “on the side.” Another reason is to maintain different signatures for different purposes, or to allow key rotation.

If we do want to write the whole signed message, it is best to explicitly convert the result to bytes.

>>> encoded = bytes(result)

The verification returns back the verified message. This is the best way to use signatures; this way, it is impossible for the code to handle an unverified message.

>>> verify_key.verify(encoded)

b'The number you shall count is three'

However, if it is necessary to read the object itself from somewhere else, and then pass it into the verifier, that is also easy to do.

>>> verify_key.verify(b'The number you shall count is three',

... result.signature)

b'The number you shall count is three'

Finally, we can just use the result object as is to verify.

>>> verify_key.verify(result)

b'The number you shall count is three'

8.3 Passlib

Secure storage of passwords is a delicate matter. The biggest reason it is so subtle is that it has to deal with people who do not use password best practices. If all passwords were strong, and people never reused passwords from site to site, password storage would be straightforward.

However, people usually choose passwords with little entropy (123456 is still unreasonably popular, as well as password), they have a “standard password” that they use for all websites, and they are often vulnerable to phishing attacks and social engineering attacks where they divulge the password to an unauthorized third party.

Not all of these threats can be stopped by correctly storing passwords, but many of them can, at least, be mitigated and weakened.

The passlib library is written by people who are well versed in software security, and tries to, at least, eliminate the most obvious mistakes when saving passwords. Passwords are never saved in plain text – always hashed.

Note that hashing algorithms for passwords are optimized for different use cases than hashing algorithms used for other reasons: for example, one of the things they try to deny is brute-force source mapping attacks.

Passlib hashes passwords with the latest vetted algorithms optimized for password storage, and they intended to avoid any possibility of side-channel attacks. In addition, “salt” is always used for hashing the passwords.

Although passlib can be used without understanding these things, it is worthwhile to understand them in order to avoid mistakes while using passlib.

Hashing means taking the users’ passwords and running it through a function that is reasonably easy to compute but hard to invert. This means that even if an attacker gets access to the password database, they cannot recover users’ passwords and pretend to be them.

One way that the attacker can try to get the original passwords is to try all combinations of passwords they can come up with, hash them, and see if they are equal to a password. In order to avoid this, special algorithms are used that are computationally hard. This means that an attacker would have to use a lot of resources in order to try many passwords, so that even if, say, only a few million passwords are tried, it would take a long time to compare. Lastly, attackers can use something called “rainbow tables” to precompute many hashes of common passwords, and compare them all at once against a password database. In order to avoid that, passwords are “salted” before they are hashed: a random prefix (the “salt”) is added, the password is hashed, and the salt is prefixed to the hash value. When the password is received from the user, the salt is retrieved from the beginning of the hash value, before hashing it to compare.

Doing all of this from scratch is hard and even harder to get it right. Getting it “right” does not just mean having users log in, but being resilient to the password database being stolen. Since there is no feedback about that aspect, it is best to use a well-tested library.

The library is storage agnostic: it does not care where the passwords are being stored. However, it does care that it is possible to update the hashed passwords. This way, hashed passwords can get updated to newer hashing schemes as the need arises. While passlib does support various low-level interfaces, it is best to use the high-level interface of the CryptContext. The name is misleading, since it does no encryption; it is a reference to vaguely similar (and largely deprecated) functionality built into Unix.

The first thing to do is to decide on a list of supported hashes. Note that not all of them have to be good hashes; if we have supported bad hashes in the past, they still have to be in the list. In this example, we choose argon2 as our preferred hash but allow a few more options.

>>> hashes = ["argon2", "pbkdf2_sha256", "md5_crypt", "des_crypt"]

Note that md5 and des have serious vulnerabilities and are not suitable to use in real applications. We added them because there might be old hashes using them. In contrast, even though pbkdf2_sha256 is, probably, worse than argon2, there is no urgent need to update it. We want to mark md5 and des as deprecated.

>>> deprecated = ["md5_crypt", "des_crypt"]

Finally, after having made the decisions, we build the crypto context:

>>> from passlib.context import CryptContext

>>> ctx = CryptContext(schemes=hashes, deprecated=deprecated)

It is possible to configure other details, such as the number of rounds. This is almost always unnecessary, as the defaults should be good enough.

Sometimes we will want to keep this information in some configuration (for example, an environment variable or a file) and load it; this way, we can update the list of hashes without modifying the code.

>>> serialized = ctx.to_string()

>>> new_ctx = CryptContext.from_string(serialized)

When saving the string, note that it does contain newlines; this might impact where it can be saved. If needed, it is always possible to convert it to base64.

On user creation or change password, we need to hash the password before storing it. This is done via the hash method on the context.

>>> res = ctx.hash("good password")

When logging in, the first step is to retrieve the hash from storage. After retrieving the hash, and having the users’ passwords from the user interface, we need to check that they match, and possibly update the hash if it is using a deprecated protocol.

>>> ctx.verify_and_update("good password", res)

(True, None)

If the second element were true, we would need to update the hash with the result. In general, it is not a good idea to specify a specific hash algorithm, but to trust the context defaults. However, in order to showcase the update, we can force the context to hash with a weak algorithm.

>>> res = ctx.hash("good password", scheme="md5_crypt")

In that case, verify_and_update would let us know we should update the hash:

>>> ctx.verify_and_update("good password", res)

(True, '$5$...')

In that case, we would need to store the second element in our password hash storage.

8.4 TLS Certificates

Transport Layer Security (TLS) is a cryptographic way to protect data in transit. Since one potential attack is man-in-the-middle, it is important to be able to verify that the endpoints are correct. For this reason, the public keys are signed by Certificate Authorities. Sometimes, it is useful to have a local certificate authority.

One case where that can be useful is in microservice architectures, where verifying each service is the right one allows a more secure installation. Another case where that is useful is for putting together an internal test environment, where using real certificate authorities is sometimes not worth the effort; it is easy enough to install the local certificate authority as locally trusted and sign the relevant certificates with it.

Another place that this can be useful is in running tests. When running integration tests, we would like to set up a realistic integration environment. Ideally, some of these tests would check that; indeed, TLS is used rather than plain text. This is impossible to test if, for purposes of testing, we downgrade to plain-text communication. Indeed, the root cause of many production security breaches is that the code, inserted for testing, to enable plain-text communication, was accidentally enabled (or possible to maliciously enable) in production; and furthermore, it was impossible to test that such bugs did not exist, because the testing environment did have plain-text communication.

For the same reason, allowing TLS connections without verification in the testing environment is dangerous. This means that the code has a non-verification flow, which can accidentally turn on, or maliciously be turned on, in production, and is impossible to prevent with testing.

Creating a certificate manually requires access to the hazmat layer in cryptography. This is so named because this is dangerous; we have to judiciously choose encryption algorithms and parameters, and the wrong choices can lead to insecure modes.

In order to perform cryptography, we need a “back end.” This is because originally it was intended to support multiple back ends. This design is mostly deprecated, but we still need to create it and pass it around.

>>> from cryptography.hazmat.backends import default_backend

Finally, we are ready to generate our private key. For this example, we will use 2048 bits, which is considered “reasonably secure” as of 2019. A complete discussion of which sizes provide how much security is beyond the scope of this chapter.

>>> from cryptography.hazmat.primitives.asymmetric import rsa

>>> private_key = rsa.generate_private_key(

... public_exponent=65537,

... key_size=2048,

... backend=default_backend()

... )

As always in asymmetric cryptography, it is possible (and fast) to calculate the public key from the private key.

>>> public_key = private_key.public_key()

This is important, since the certificate only refers to the public key. Since the private key is never shared, it is not worthwhile, and actively dangerous, to make any assertions about it.

The next step is to create a certificate builder. The certificate builder will be used to add “assertions” about the public key. In this case, we are going to finish by self-signing the certificate, since CA certificates are self-signed.

>>> from cryptography import x509

>>> builder = x509.CertificateBuilder()

We then add names. Some names are required, though it is not important to have specific contents in them.

>>> from cryptography.x509.oid import NameOID

>>> builder = builder.subject_name(x509.Name([

... x509.NameAttribute(NameOID.COMMON_NAME, 'Simple Test CA'),

... ]))

>>> builder = builder.issuer_name(x509.Name([

... x509.NameAttribute(NameOID.COMMON_NAME, 'Simple Test CA'),

... ]))

We need to decide a validity range. For this, it is useful to be able to have a “day” interval for easy calculation.

>>> import datetime

>>> one_day = datetime.timedelta(days=1)

We want to make the validity range start “slightly before now.” This way, it will be valid for clocks with some amount of skew.

>>> today = datetime.date.today()

>>> yesterday = today - one_day

>>> builder = builder.not_valid_before(yesterday)

Since this certificate will be used for testing, we do not need to have it be valid for a long time. We will make it valid for 30 days.

>>> next_month = today + (30 ∗ day)

>>> builder = builder.not_valid_after(next_month)

The serial number needs to uniquely identify the certificate. Since keeping enough information to remember which serial numbers we used is complicated, we choose a different path: choosing a random serial number. The probability of having the same serial number chosen twice is extremely low.

>>> builder = builder.serial_number(x509.random_serial_number())

We then add the public key that we generated. This certificate is made of assertions about this public key.

>>> builder = builder.public_key(public_key)

Since this is a CA certificate, we need to mark it as a CA certificate.

>>> builder = builder.add_extension(

... x509.BasicConstraints(ca=True, path_length=None),

... critical=True)

Finally, after we have added all the assertions into the builder, we need to generate the hash and sign it.

>>> from cryptography.hazmat.primitives import hashes

>>> certificate = builder.sign(

... private_key=private_key, algorithm=hashes.SHA256(),

... backend=default_backend()

... )

This is it! We now have a private key, and a self-signed certificate that claims to be a CA. However, we will need to store them in files.

The PEM file format is friendly to simple concatenation. Indeed, usually this is how certificates are stored: in the same file with the private key (since they are useless without it).

>>> from cryptography.hazmat.primitives import serialization

>>> private_bytes = private_key.private_bytes(

... encoding=serialization.Encoding.PEM,

... format=serialization.PrivateFormat.TraditionalOpenSSL,

... encryption_algorithm=serialization.NoEncrption())

>>> public_bytes = certificate.public_bytes(

... encoding=serialization.Encoding.PEM)

>>> with open("ca.pem", "wb") as fout:

... fout.write(private_bytes + public_bytes)

>>> with open("ca.crt", "wb") as fout:

... fout.write(public_bytes)

This gives us the capability to now be a CA.

In general, for real certificate authorities, we need to generate a Certificate Signing Request (CSR) in order to prove that the owner of the private key actually wants that certificate. However, since we are the certificate authority, we can just create the certificate directly.

There is no difference between creating a private key for a certificate authority and a private key for a service.

>>> service_private_key = rsa.generate_private_key(

... public_exponent=65537,

... key_size=2048,

... backend=default_backend()

... )

Since we need to sign the public key, we need to again calculate it from the private key:

>>> service_public_key = service_private_key.public_key()

We create a new builder for the service certificate:

>>> builder = x509.CertificateBuilder()

For services, the COMMON_NAME is important; this is what the clients will verify the domain name against.

>>> builder = builder.subject_name(x509.Name([

... x509.NameAttribute(NameOID.COMMON_NAME, 'service.test.local')

... ]))

We assume that the service will be accessed as service.test.local, using some local test resolution. Once again, we limit our certificate validity to about a month.

>>> builder = builder.not_valid_before(yesterday)

>>> builder = builder.not_valid_after(next_month)

This time, we sign the service public key:

>>> builder = builder.public_key(public_key)

However, we sign with the private key of the CA; we do not want this certificate to be self-signed.

>>> certificate = builder.sign(

... private_key=private_key, algorithm=hashes.SHA256(),

... backend=default_backend()

... )

Again, we write a PEM file with the key and the certificate:

>>> private_bytes = service_private_key.private_bytes(

... encoding=serialization.Encoding.PEM,

... format=serialization.PrivateFormat.TraditionalOpenSSL,

... encryption_algorithm=serialization.NoEncrption())

>>> public_bytes = certificate.public_bytes(

... encoding=serialization.Encoding.PEM)

>>> with open("service.pem", "wb") as fout:

... fout.write(private_bytes + public_bytes)

The service.pem file is in a format that can be used by most popular web servers: Apache, Nginx, HAProxy, and many more. It can also be used directly by the Twisted web server, by using the txsni extension.

If we add the ca.crt file to the trust root, and run, say, an Nginx server, on an IP that our client would resolve from service.test.local, then when we connect clients to https://service.test.local , they will verify that the certificate is indeed valid.

8.5 Summary

Cryptography is a powerful tool but one that is easy to misuse. By using well-understood high-level functions, we reduce many of the risks in using cryptography. While this does not substitute proper risk analysis and modeling, it does make this exercise somewhat easier.

Python has several third-party libraries with well-vetted code, and it is a good idea to use them.

8. Cryptography

8.1 Fernet

8.2 PyNaCl

8.3 Passlib

8.4 TLS Certificates

8.5 Summary