ClamAV Virus Signatures

Before we learn how to write our own ClamAV signatures, let's take a look at the exiting virus signatures.

On of the virus databases must be unpacked to display signatures:

sigtool --unpack-current daily.cvd

This creates four files:

daily.db

Contains hexadecimal signatures in the format Virus=Hexadecimal; for example:

Darth Vader=2012cd2f268a1db81612cd2f
daily.ndb

Similar to daily.db, it contains hexadecimal signatures with other information (an offset and a file type); for example:

Exploit.WMF.Gen-1:0:0:010009000003521f0000????????????0000??00000026
daily.hdb

Contains MD5 signatures; for example:

87288c7a5e5bd354eea8095585164e75:98304:Flooder.MSNBM-1
daily.zmd

Contains signatures for Compressed Portable Executable files

The main.cvd database file contains another important file, main.fp, which excludes known false positives.

If a malware is always embedded in the exact same file, a simple MD5 signature can be used to detect it. This is very common for viruses that spread as email attachments.

[julien@asus clamav]$ sigtool --md5clam.exe
aa15bcf478d165efd2065190eb473bcb:544:clam.exe

The format of the signature is MD5:file_size:signature_name. In this example, the size of clam.exe is 544 bytes.

All the MD5 signatures should be added to an .hdb file:

[julien@asus clamav]$ sigtool --md5clam.exe >> custom.hdb

This new file can be used with clamscan and clamc:

[julien@asus clamav]$ clamscan -dcustom.hdb clam.exe
clam.exe: clam.exe FOUND

----------- SCAN SUMMARY -----------
Known viruses: 1
Engine version: 0.88.2
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Time: 0.000 sec (0 m 0 s)

Most malware, especially worms, mutate or copy themselves to different files. It is not possible to use an MD5 signature because each file will vary. They require reverse engineering (see Chapter 23) to fingerprint them in order to find a unique string (or a set of strings) that can identify the malware. ClamAV can look for strings in hexadecimal format anywhere in a stream. For example, if a malware always contains the string I am a virus, the corresponding signature is 4920616d2061207669727573:

[julien@asus ˜]$ echo"I am a virus"|sigtool --hex-dump
4920616d20612076697275730a

The signature can be added to a .db file:

[julien@asus ˜]$ echo"Dumb.Virus=4920616d2061207669727573">>custom.db

ClamAV supports the wildcards listed in Table 16-1.

If we know that the virus actually contains I am a virus twice, with at least 10 bytes between the two strings, we can improve the signature to avoid false positives:

Dumb.Virus=4920616d2061207669727573{10-}4920616d2061207669727573

To find strings that look like a Social Security number, we need to look for:

The advanced formats include an offset and a file type. The offset specifies at which point in the stream ClamAV should start matching the hexadecimal signature. The file types indicating to which file it applies are listed in Table 16-2.

To apply Dumb.Virus to executables only, we can change the file type (second field) to 1. The new signature is:

Dumb.Virus.Exec:1:*:4920616d2061207669727573

ClamAV detects and normalizes HTML files. HTML is a very loose language; the same string can be written in many ways using HTML entities, JavaScript, encoded JavaScript, optional whitespaces, optional quote, and double quote, etc. Without a good normalization, it is virtually impossible to cover all the evasion techniques.

To see how ClamAV normalizes HTML files, run:

 sigtool -html-normalize file.html

This command creates three files:

comment.html

This is normalized HTML with the original HTML comments, if any.

nocomment.html

This is the same as comment.html, but without the comments.

script.html

This is normalized and decoded JavaScript from the original HTML file.

Each signature that targets HTML files (file type 3) is run against the three files. For example, this signature looks for <title>I am a virus</title>:

HTML.Dumb.Virus:3:*:3c7469746c653e4920616d20612076697275733c2f7469746c653e