You want to use a hash function to process data incrementally, returning a result when the last of the data is finally available.
Most hash functions use a standard interface for operation, following these steps:
The user creates a "context" object to hold intermediate state.
The context object gets initialized.
The context is "updated" by passing in the data to be hashed.
When the data is updated, "finalization" returns the output of the cryptographic hash function.
Hash functions are not secure by themselves—not for a password system, not for message authentication, not for anything! If you do need a hash function by itself, be sure to at least protect against length extension attacks, as described in Recipe 6.7 and Recipe 6.8.
Libraries with cryptographic hash functions tend to support incremental operation using a standard structure. In fact, this structure is standardized for cryptographic hardware APIs in PKCS (Public Key Cryptography Standard) #11. There are four steps:
Allocate a context object. The context object holds the internal
state of the hash until data processing is complete. The type can be
specific to the hash function, or it can be a single type that works
for all hash functions in a library (such as the
EVP_MD_CTX
type in the OpenSSL library or
HCRYPTHASH
in Microsoft's
CryptoAPI).
Initialize the context object, resetting internal parameters of the hash function. Generally, this function takes no arguments other than a pointer to the context object, unless you're using a generic API, in which case you will need to specify which hash algorithm to use.
"Update" the context object by passing in data to be hashed and the associated length of that input. The results of the hash will be dependent on the order of the data you pass, but you can pass in all the partial data you wish. That is, calling the update routine with the string "he" then "llo" would produce the same results as calling it once with the string "hello". The update function generally takes the context object, the data to process, and the associated length of that data as arguments.
"Finalize" the context object and produce the message digest. Most APIs take as arguments the context object and a buffer into which the message digest is placed.
The OpenSSL API has both a single generic interface to all its hash functions and a separate API for each hash function. Here's an example using the SHA1 API:
#include <stdio.h> #include <string.h> #include <openssl/sha.h> int main(int argc, char *argv[ ]) { int i; SHA_CTX ctx; unsigned char result[SHA_DIGEST_LENGTH]; /* SHA1 has a 20-byte digest. */ unsigned char *s1 = "Testing"; unsigned char *s2 = "...1...2...3..."; SHA1_Init(&ctx); SHA1_Update(&ctx, s1, strlen(s1)); SHA1_Update(&ctx, s2, strlen(s2)); /* Yes, the context object is last. */ SHA1_Final(result, &ctx); printf("SHA1(\"%s%s\") = ", s1, s2); for (i = 0; i < SHA_DIGEST_LENGTH; i++) printf("%02x", result[i]); printf("\n"); return 0; }
Every hash function that OpenSSL supports has a similar API. In addition, every such function has an "all-in-one" API that allows you to combine the work of calls for initialization, updating, and finalization, obviating the need for a context object:
unsigned char *SHA1(unsigned char *in, unsigned long len, unsigned char *out);
This function returns a pointer to the out
argument.
Both the incremental API and the all-in-one API are very standard, even beyond OpenSSL. The reference versions of most hash algorithms look incredibly similar. In fact, Microsoft's CryptoAPI for Windows provides a very similar API. Any of the Microsoft CSPs provide implementations of MD2, MD5, and SHA1. The following code is the CryptoAPI version of the OpenSSL code presented previously:
#include <windows.h> #include <wincrypt.h> #include <stdio.h> int main(int argc, char *argv[ ]) { BYTE *pbData; DWORD cbData = sizeof(DWORD), cbHashSize, i; HCRYPTHASH hSHA1; HCRYPTPROV hProvider; unsigned char *s1 = "Testing"; unsigned char *s2 = "...1...2...3..."; CryptAcquireContext(&hProvider, 0, MS_DEF_PROV, PROV_RSA_FULL, 0); CryptCreateHash(hProvider, CALG_SHA1, 0, 0, &hSHA1); CryptHashData(hSHA1, s1, strlen(s1), 0); CryptHashData(hSHA1, s2, strlen(s2), 0); CryptGetHashParam(hSHA1, HP_HASHSIZE, (BYTE *)&cbHashSize, &cbData, 0); pbData = (BYTE *)LocalAlloc(LMEM_FIXED, cbHashSize); CryptGetHashParam(hSHA1, HP_HASHVAL, pbData, &cbHashSize, 0); CryptDestroyHash(hSHA1); CryptReleaseContext(hProvider, 0); printf("SHA1(\"%s%s\") = ", s1, s2); for (i = 0; i < cbHashSize; i++) printf("%02x", pbData[i]); printf("\n"); LocalFree(pbData); return 0; }
The preferred API for accessing hash functions from OpenSSL, though, is the EVP API, which provides a generic API to all of the hash functions OpenSSL supports. The following code does the same thing as the first example with the EVP interface instead of the SHA1 interface:
#include <stdio.h> #include <string.h> #include <openssl/evp.h> int main(int argc, char *argv[ ]) { int i, ol; EVP_MD_CTX ctx; unsigned char result[EVP_MAX_MD_SIZE]; /* enough for any hash function */ unsigned char *s1 = "Testing"; unsigned char *s2 = "...1...2...3..."; /* Note the extra parameter */ EVP_DigestInit(&ctx, EVP_sha1( )); EVP_DigestUpdate(&ctx, s1, strlen(s1)); EVP_DigestUpdate(&ctx, s2, strlen(s2)); /* Here, the context object is first. Notice the pointer to the output length */ EVP_DigestFinal(&ctx, result, &ol); printf("SHA1(\"%s%s\") = ", s1, s2); for (i = 0; i < ol; i++) printf("%02x", result[i]); printf("\n"); return 0; }
Note particularly that EVP_DigestFinal(
)
requires you to pass in a pointer to an
integer, into which the output length is stored. You should use this
value in your computations instead of hardcoding
SHA1's digest size, under the assumption that you
might someday have to replace crypto algorithms in a hurry, in which
case the digest size may change. For that reason, allocate
EVP_MAX_MD_SIZE
bytes for any buffer into which
you store a message digest, even if some of that space may go unused.
Alternatively, if you'd like to allocate a buffer of
the correct size for output dynamically (which is a good idea if
you're space-constrained, because if SHA-512 is ever
added to OpenSSL, EVP_MAX_MD_SIZE
will become 512
bits), you can use the function EVP_MD_CTX_size(
)
, which takes a context object and returns the
size of the digest. For example:
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <openssl/evp.h> int main(int argc, char *argv[ ]) { int i, ol; EVP_MD_CTX ctx; unsigned char *result; unsigned char *s1 = "Testing"; unsigned char *s2 = "...1...2...3..."; EVP_DigestInit(&ctx, EVP_sha1( )); EVP_DigestUpdate(&ctx, s1, strlen(s1)); EVP_DigestUpdate(&ctx, s2, strlen(s2)); if (!(result = (unsigned char *)malloc(EVP_MD_CTX_block_size(&ctx))))abort(); EVP_DigestFinal(&ctx, result, &ol); printf("SHA1(\"%s%s\") = ", s1, s2); for (i = 0; i < ol; i++) printf("%02x", result[i]); printf("\n"); free(result); return 0; }
The OpenSSL library supports only two cryptographic hash functions that we recommend, SHA1 and RIPEMD-160. It also supports MD2, MD4, MD5, and MDC-2-DES. MDC-2-DES is reasonable, but it is slow and provides only 64 bits of resistance to birthday attacks, whereas we recommend a minimum baseline of 80 bits of security. As an alternative, you could initialize the hash function with a nonce, as discussed in Recipe 6.8.
Nonetheless, Table 6-3 contains a summary of the necessary information on each hash function to use both the EVP and hash-specific APIs with OpenSSL.
Table 6-3. OpenSSL-supported hash functions
Message digest function |
EVP function to specify MD |
Context type for MD-specific API |
Prefix for MD-specific API calls (i.e., XXX_Init, ...) |
Include file for MD-specific API |
---|---|---|---|---|
MD2 |
|
|
|
openssl/md2.h |
MD4 |
|
|
|
openssl/md4.h |
MD5 |
|
|
|
openssl/md5.h |
MDC-2-DES |
|
|
|
openssl/mdc2.h |
RIPEMD-160 |
|
|
|
openssl/ripemd.h |
SHA1 |
|
|
|
openssl/sha.h |
Of course, you may want to use an off-the-shelf hash function that isn't supported by either OpenSSL or CryptoAPI—for example, SHA-256, SHA-384, or SHA-512. Aaron Gifford has produced a good, free library with implementations of these functions and released it under a BSD-style license. It is available from http://www.aarongifford.com/computers/sha.html.
That library exports an API that should look very familiar:
SHA256_Init(SHA256_CTX *ctx); SHA256_Update(SHA256_CTX *ctx, unsigned char *data, size_t inlen); SHA256_Final(unsigned char out[SHA256_DIGEST_LENGTH], SHA256_CTX *ctx); SHA384_Init(SHA384_CTX *ctx); SHA384_Update(SHA384_CTX *ctx, unsigned char *data, size_t inlen); SHA384_Final(unsigned char out[SHA384_DIGEST_LENGTH], SHA384_CTX *ctx); SHA512_Init(SHA512_CTX *ctx); SHA512_Update(SHA512_CTX *ctx, unsigned char *data, size_t inlen); SHA512_Final(unsigned char out[SHA512_DIGEST_LENGTH], SHA512_CTX *ctx);
All of the previous functions are prototyped in the sha2.h header file.
Implementations of SHA-256 and SHA-512 from Aaron Gifford: http://www.aarongifford.com/computers/sha.html