6.5. Incrementally Hashing Data

Problem

You want to use a hash function to process data incrementally, returning a result when the last of the data is finally available.

Solution

Most hash functions use a standard interface for operation, following these steps:

The user creates a "context" object to hold intermediate state.
The context object gets initialized.
The context is "updated" by passing in the data to be hashed.
When the data is updated, "finalization" returns the output of the cryptographic hash function.

Discussion

Warning

Hash functions are not secure by themselves—not for a password system, not for message authentication, not for anything! If you do need a hash function by itself, be sure to at least protect against length extension attacks, as described in Recipe 6.7 and Recipe 6.8.

Libraries with cryptographic hash functions tend to support incremental operation using a standard structure. In fact, this structure is standardized for cryptographic hardware APIs in PKCS (Public Key Cryptography Standard) #11. There are four steps:

Allocate a context object. The context object holds the internal state of the hash until data processing is complete. The type can be specific to the hash function, or it can be a single type that works for all hash functions in a library (such as the EVP_MD_CTX type in the OpenSSL library or HCRYPTHASH in Microsoft's CryptoAPI).
Initialize the context object, resetting internal parameters of the hash function. Generally, this function takes no arguments other than a pointer to the context object, unless you're using a generic API, in which case you will need to specify which hash algorithm to use.
"Update" the context object by passing in data to be hashed and the associated length of that input. The results of the hash will be dependent on the order of the data you pass, but you can pass in all the partial data you wish. That is, calling the update routine with the string "he" then "llo" would produce the same results as calling it once with the string "hello". The update function generally takes the context object, the data to process, and the associated length of that data as arguments.
"Finalize" the context object and produce the message digest. Most APIs take as arguments the context object and a buffer into which the message digest is placed.

The OpenSSL API has both a single generic interface to all its hash functions and a separate API for each hash function. Here's an example using the SHA1 API:

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>
   
int main(int argc, char *argv[  ]) {
  int           i;
  SHA_CTX       ctx;  
  unsigned char result[SHA_DIGEST_LENGTH]; /* SHA1 has a 20-byte digest. */
  unsigned char *s1 = "Testing";  
  unsigned char *s2 = "...1...2...3...";
   
  SHA1_Init(&ctx);  
  SHA1_Update(&ctx, s1, strlen(s1));
  SHA1_Update(&ctx, s2, strlen(s2));
  /* Yes, the context object is last. */
  SHA1_Final(result, &ctx);
   
  printf("SHA1(\"%s%s\") = ", s1, s2);  
  for (i = 0;  i < SHA_DIGEST_LENGTH;  i++) printf("%02x", result[i]);
  printf("\n");
   
  return 0;
}

Every hash function that OpenSSL supports has a similar API. In addition, every such function has an "all-in-one" API that allows you to combine the work of calls for initialization, updating, and finalization, obviating the need for a context object:

unsigned char *SHA1(unsigned char *in, unsigned long len, unsigned char *out);

This function returns a pointer to the out argument.

Both the incremental API and the all-in-one API are very standard, even beyond OpenSSL. The reference versions of most hash algorithms look incredibly similar. In fact, Microsoft's CryptoAPI for Windows provides a very similar API. Any of the Microsoft CSPs provide implementations of MD2, MD5, and SHA1. The following code is the CryptoAPI version of the OpenSSL code presented previously:

#include <windows.h>
#include <wincrypt.h>
#include <stdio.h>
   
int main(int argc, char *argv[  ]) {
  BYTE          *pbData;
  DWORD         cbData = sizeof(DWORD), cbHashSize, i;
  HCRYPTHASH    hSHA1;
  HCRYPTPROV    hProvider;
  unsigned char *s1 = "Testing";
  unsigned char *s2 = "...1...2...3...";
   
  CryptAcquireContext(&hProvider, 0, MS_DEF_PROV, PROV_RSA_FULL, 0);
  CryptCreateHash(hProvider, CALG_SHA1, 0, 0, &hSHA1);
  CryptHashData(hSHA1, s1, strlen(s1), 0);
  CryptHashData(hSHA1, s2, strlen(s2), 0);
  CryptGetHashParam(hSHA1, HP_HASHSIZE, (BYTE *)&cbHashSize, &cbData, 0);
  pbData = (BYTE *)LocalAlloc(LMEM_FIXED, cbHashSize);
  CryptGetHashParam(hSHA1, HP_HASHVAL, pbData, &cbHashSize, 0);
  CryptDestroyHash(hSHA1);
  CryptReleaseContext(hProvider, 0);
   
  printf("SHA1(\"%s%s\") = ", s1, s2);
  for (i = 0;  i < cbHashSize;  i++) printf("%02x", pbData[i]);
  printf("\n");
   
  LocalFree(pbData);
  return 0;
}

The preferred API for accessing hash functions from OpenSSL, though, is the EVP API, which provides a generic API to all of the hash functions OpenSSL supports. The following code does the same thing as the first example with the EVP interface instead of the SHA1 interface:

#include <stdio.h>
#include <string.h>
#include <openssl/evp.h>
   
int main(int argc, char *argv[  ]) {
  int           i, ol;
  EVP_MD_CTX    ctx;
  unsigned char result[EVP_MAX_MD_SIZE]; /* enough for any hash function */
  unsigned char *s1 = "Testing";
  unsigned char *s2 = "...1...2...3...";
   
  /* Note the extra parameter */
  EVP_DigestInit(&ctx, EVP_sha1(  ));
  EVP_DigestUpdate(&ctx, s1, strlen(s1));
  EVP_DigestUpdate(&ctx, s2, strlen(s2));
  /* Here, the context object is first. Notice the pointer to the output length */
  EVP_DigestFinal(&ctx, result, &ol);
   
  printf("SHA1(\"%s%s\") = ", s1, s2);
  for (i = 0;  i < ol;  i++) printf("%02x", result[i]);
  printf("\n");
   
  return 0;
}

Note particularly that EVP_DigestFinal( ) requires you to pass in a pointer to an integer, into which the output length is stored. You should use this value in your computations instead of hardcoding SHA1's digest size, under the assumption that you might someday have to replace crypto algorithms in a hurry, in which case the digest size may change. For that reason, allocate EVP_MAX_MD_SIZE bytes for any buffer into which you store a message digest, even if some of that space may go unused.

Alternatively, if you'd like to allocate a buffer of the correct size for output dynamically (which is a good idea if you're space-constrained, because if SHA-512 is ever added to OpenSSL, EVP_MAX_MD_SIZE will become 512 bits), you can use the function EVP_MD_CTX_size( ) , which takes a context object and returns the size of the digest. For example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/evp.h>
   
int main(int argc, char *argv[  ]) {
  int           i, ol;
  EVP_MD_CTX    ctx;
  unsigned char *result;
  unsigned char *s1 = "Testing";
  unsigned char *s2 = "...1...2...3...";
   
  EVP_DigestInit(&ctx, EVP_sha1(  ));
  EVP_DigestUpdate(&ctx, s1, strlen(s1));
  EVP_DigestUpdate(&ctx, s2, strlen(s2));
  if (!(result = (unsigned char *)malloc(EVP_MD_CTX_block_size(&ctx))))abort();
  EVP_DigestFinal(&ctx, result, &ol);
   
  printf("SHA1(\"%s%s\") = ", s1, s2);
  for (i = 0;  i < ol;  i++) printf("%02x", result[i]);
  printf("\n");
   
  free(result);
  return 0;
}

The OpenSSL library supports only two cryptographic hash functions that we recommend, SHA1 and RIPEMD-160. It also supports MD2, MD4, MD5, and MDC-2-DES. MDC-2-DES is reasonable, but it is slow and provides only 64 bits of resistance to birthday attacks, whereas we recommend a minimum baseline of 80 bits of security. As an alternative, you could initialize the hash function with a nonce, as discussed in Recipe 6.8.

Nonetheless, Table 6-3 contains a summary of the necessary information on each hash function to use both the EVP and hash-specific APIs with OpenSSL.

Table 6-3. OpenSSL-supported hash functions

Message digest function	EVP function to specify MD	Context type for MD-specific API	Prefix for MD-specific API calls (i.e., XXX_Init, ...)	Include file for MD-specific API
MD2	`EVP_md2()`	`MD2_CTX`	`MD2`	openssl/md2.h
MD4	`EVP_md4()`	`MD4_CTX`	`MD4`	openssl/md4.h
MD5	`EVP_md5()`	`MD5_CTX`	`MD5`	openssl/md5.h
MDC-2-DES	`EVP_mdc2()`	`MDC2_CTX`	`MDC2`	openssl/mdc2.h
RIPEMD-160	`EVP_ripemd160()`	`RIPEMD160_CTX`	`RIPEMD160`	openssl/ripemd.h
SHA1	`EVP_sha1()`	`SHA_CTX`	`SHA1`	openssl/sha.h

Of course, you may want to use an off-the-shelf hash function that isn't supported by either OpenSSL or CryptoAPI—for example, SHA-256, SHA-384, or SHA-512. Aaron Gifford has produced a good, free library with implementations of these functions and released it under a BSD-style license. It is available from http://www.aarongifford.com/computers/sha.html.

That library exports an API that should look very familiar:

SHA256_Init(SHA256_CTX *ctx);
SHA256_Update(SHA256_CTX *ctx, unsigned char *data, size_t inlen);
SHA256_Final(unsigned char out[SHA256_DIGEST_LENGTH], SHA256_CTX *ctx);
SHA384_Init(SHA384_CTX *ctx);
SHA384_Update(SHA384_CTX *ctx, unsigned char *data, size_t inlen);
SHA384_Final(unsigned char out[SHA384_DIGEST_LENGTH], SHA384_CTX *ctx);
SHA512_Init(SHA512_CTX *ctx);
SHA512_Update(SHA512_CTX *ctx, unsigned char *data, size_t inlen);
SHA512_Final(unsigned char out[SHA512_DIGEST_LENGTH], SHA512_CTX *ctx);

All of the previous functions are prototyped in the sha2.h header file.

6.5. Incrementally Hashing Data

Problem

Solution

Discussion

Warning

See Also