A hash function is a function that returns a fixed output for a given input. The input can be any size but the output is of fixed size. The output of a hash function is commonly called a hash, but it can also be referred to as a message digest, digest, hash value, or hash code. If there will be no need to know the original value prior to hashing, then hashing should be favored over encryption
Some examples of hash functions include MD5, SHA-256, and SHA-512. For example, the following is the Secure Hashing Algorithm 256 (SHA-256) hash of the string This is a message:
a826c7e389ec9f379cafdc544d7e9a4395ff7bfb58917bbebee51b3d0b1c996a
In the case of SHA-256, no matter how long the input is, the hash will be a 256-bit (32-byte) hash value. This is useful because even if the input is very long (for example, the contents of a file), we know that the hash will be a fixed length. Unlike encryption, where the original value can be determined through decryption, hash functions are not reversible.
Hashes can be used for purposes such as comparing two files for equality without having to read all of the contents of both files, as a checksum for detecting errors during transmission of data, finding similar records or substrings, and in data structures such as a hash table or a Bloom filter.
A cryptographic hash function is a type of hash function that guarantees certain properties, making it secure and suitable for cryptography. The combination of these properties makes a hash function useful for cryptography. We can use cryptographic hash functions for things such digital signatures, HTTPS certificates, and in protocols such as SSL/TLS and SSH. Non-cryptographic hash functions are faster but provide weaker guarantees. The following are the main properties of a cryptographic hash function:
- Quick: Cryptographic hash functions are quick to generate a hash value for a given message. If a hash function is not fast, the performance of the processes that use it may reach unacceptable levels for the given use cases.
- Deterministic: It is deterministic in that the same message will always produce the same hash. It is this property that allows us to compare two hashes in order to determine if they represent the same original value, without knowing the original value.
- One-way function: It is a one-way function in that it is infeasible to generate a message from a hash without trying out all possible messages (brute-force search). Please note that, by infeasible, we mean that although it is not impossible, it is impracticable.
- Collision resistant: It is collision resistant in that it is infeasible to find two different messages with the same hash value. A collision takes place when two different inputs result in the same hash. There should not be any collisions with a secure hash function. Some hash functions, like MD5 and SHA-1, can result in collisions and should not be used for cryptographic purposes.
- Small changes result in vastly different hashes: A small change to a message should yield a new hash that is significantly different from the old one, such that it is not possible to correlate the two hashes. For example, the first of the two hashes below is from the string Hello World while the second one is from the string Hello Worlds. As you can see, even though the original strings are almost identical, the hashes are very different:
a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e
b0f3fe9cdc1beeb7944d90e9b2e77b416fd097b5cc2c58838f8741e8129a1a52