Fuzzy hashing is a great method to compare files for similarity. ssdeep (http://ssdeep.sourceforge.net) is a useful tool to generate the fuzzy hash for a sample, and it also helps in determining percentage similarity between the samples. This technique is useful in comparing a suspect binary with the samples in a repository to identify the samples that are similar; this can help in identifying the samples that belong to the same malware family or the same actor group.
You can use ssdeep to calculate and compare fuzzy hashes. Installation of ssdeep on Ubuntu Linux VM was covered in Chapter 1, To determine a fuzzy hash of a sample, run the following command:
$ ssdeep veri.exe
ssdeep,1.1--blocksize:hash:hash,filename
49152:op398U/qCazcQ3iEZgcwwGF0iWC28pUtu6On2spPHlDB:op98USfcy8cwF2bC28pUtsRptDB,"/home/ubuntu/Desktop/veri.exe"
To demonstrate the use of fuzzy hashing, let's take an example of a directory consisting of three malware samples. In the following output, you can see that all three files have completely different MD5 hash values:
$ ls
aiggs.exe jnas.exe veri.exe
$ md5sum *
48c1d7c541b27757c16b9c2c8477182b aiggs.exe
92b91106c108ad2cc78a606a5970c0b0 jnas.exe
ce9ce9fc733792ec676164fc5b2622f2 veri.exe
The pretty matching mode (-p option) in ssdeep can be used to determine percentage similarity. From the following output, out of the three samples, two samples have 99% similarity, suggesting that these two samples probably belong to the same malware family:
$ ssdeep -pb *
aiggs.exe matches jnas.exe (99)
jnas.exe matches aiggs.exe (99)
As demonstrated in the preceding example, cryptographic hashes were not helpful in determining the relationship between the samples, whereas the fuzzy hashing technique identified the similarity between the samples.
You might have a directory containing many malware samples. In that case, it is possible to run ssdeep on directories and subdirectories containing malware samples using the recursive mode (-r) as shown here:
$ ssdeep -lrpa samples/
samples//aiggs.exe matches samples//crop.exe (0)
samples//aiggs.exe matches samples//jnas.exe (99)
samples//crop.exe matches samples//aiggs.exe (0)
samples//crop.exe matches samples//jnas.exe (0)
samples//jnas.exe matches samples//aiggs.exe (99)
samples//jnas.exe matches samples//crop.exe (0)
You can also match a suspect binary with a list of file hashes. In the following example, the ssdeep hashes of all the binaries were redirected to a text file (all_hashes.txt), and then the suspect binary (blab.exe) is matched with all the hashes in the file. From the following output, it can be seen that the suspect binary (blab.exe) is identical to jnas.exe (100% match) and has 99% similarity with aiggs.exe. You can use this technique to compare any new file with the hashes of previously analyzed samples:
$ ssdeep * > all_hashes.txt
$ ssdeep -m all_hashes.txt blab.exe
/home/ubuntu/blab.exe matches all_hashes.txt:/home/ubuntu/aiggs.exe (99)
/home/ubuntu/blab.exe matches all_hashes.txt:/home/ubuntu/jnas.exe (100)
In Python, the fuzzy hash can be computed using python-ssdeep (https://pypi.python.org/pypi/ssdeep/3.2). The installation of the python-ssdeep module on Ubuntu Linux VM was covered in Chapter 1, Introduction to Malware Analysis. To calculate and compare fuzzy hashes, the following commands can be used in the script:
$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
>>> import ssdeep
>>> hash1 = ssdeep.hash_from_file('jnas.exe')
>>> print hash1
384:l3gexUw/L+JrgUon5b9uSDMwE9Pfg6NgrWoBYi51mRvR6JZlbw8hqIusZzZXe:pIAKG91Dw1hPRpcnud
>>> hash2 = ssdeep.hash_from_file('aiggs.exe')
>>> print hash2
384:l3gexUw/L+JrgUon5b9uSDMwE9Pfg6NgrWoBYi51mRvR6JZlbw8hqIusZzZWe:pIAKG91Dw1hPRpcnu+
>>> ssdeep.compare(hash1, hash2)
99
>>>