Before we learn how to write our own ClamAV signatures, let's take a look at the exiting virus signatures.
On of the virus databases must be unpacked to display signatures:
sigtool --unpack-current daily.cvd
This creates four files:
Contains hexadecimal signatures in the format Virus
=
Hexadecimal
; for example:
Darth Vader=2012cd2f268a1db81612cd2f
Similar to daily.db, it contains hexadecimal signatures with other information (an offset and a file type); for example:
Exploit.WMF.Gen-1:0:0:010009000003521f0000????????????0000??00000026
Contains MD5 signatures; for example:
87288c7a5e5bd354eea8095585164e75:98304:Flooder.MSNBM-1
Contains signatures for Compressed Portable Executable files
The main.cvd database file contains another important file, main.fp, which excludes known false positives.
If a malware is always embedded in the exact same file, a simple MD5 signature can be used to detect it. This is very common for viruses that spread as email attachments.
[julien@asus clamav]$sigtool --md5
clam.exe
aa15bcf478d165efd2065190eb473bcb:544:clam.exe
The format of the signature is MD5:file_size:signature_name
. In this example, the size of clam.exe is 544 bytes.
All the MD5 signatures should be added to an .hdb file:
[julien@asus clamav]$sigtool --md5
clam.exe >> custom.hdb
This new file can be used with clamscan and clamc:
[julien@asus clamav]$clamscan -d
custom.hdb clam.exe
clam.exe: clam.exe FOUND ----------- SCAN SUMMARY ----------- Known viruses: 1 Engine version: 0.88.2 Scanned directories: 0 Scanned files: 1 Infected files: 1 Data scanned: 0.00 MB Time: 0.000 sec (0 m 0 s)
Most malware, especially worms, mutate or copy themselves to different files. It is not possible to use an MD5 signature because each file will vary. They require reverse engineering (see Chapter 23) to fingerprint them in order to find a unique string (or a set of strings) that can identify the malware. ClamAV can look for strings in hexadecimal format anywhere in a stream. For example, if a malware always contains the string I am a virus
, the corresponding signature is 4920616d2061207669727573
:
[julien@asus ˜]$echo
"I am a virus"
|sigtool --hex-dump
4920616d20612076697275730a
The last character (0a
) is a new line character that is introduced by the echo command. The hexadecimal version of I am a virus
is 4920616d2061207669727573
. See http://www.lookuptables.com/ for the ASCII to hexadecimal conversion.
The signature can be added to a .db file:
[julien@asus ˜]$echo
"Dumb.Virus=4920616d2061207669727573"
>>
custom.db
ClamAV supports the wildcards listed in Table 16-1.
Table 16-1. Wildcards supported by ClamAV
Wildcard | Desccription |
---|---|
?? | Matches any byte. |
* | Matches any number of bytes (including 0 byte). |
| Matches |
| Matches |
| Matches at least |
| Matches between |
| Matches |
If we know that the virus actually contains I am a virus
twice, with at least 10 bytes between the two strings, we can improve the signature to avoid false positives:
Dumb.Virus=4920616d2061207669727573{10-}4920616d2061207669727573
To find strings that look like a Social Security number, we need to look for:
The hexadecimal code for the numbers is 30
(0) to 39
(9).
The hexadecimal for a hyphen (-
) is 2d
.
The signature is:
SSN=(30|31|32|33|34|35|36|37|38|39)(30|31|32|33|34|35|36|37|38|39) 2d(30|31|32|33|34|35|36|37|38|39)2d(30|31|32|33|34|35|36|37|38|39) 2d(30|31|32|33|34|35|36|37|38|39) (30|31|32|33|34|35|36|37|38|39)2d(30|31|32|33|34|35|36|37|38|39) (30|31|32|33|34|35|36|37|38|39)
ClamAV version 0.88.2 has a bug with alternative matching (|
). This example works with ClamAV version 0.88.3 and above.
The advanced formats include an offset and a file type. The offset specifies at which point in the stream ClamAV should start matching the hexadecimal signature. The file types indicating to which file it applies are listed in Table 16-2.
Table 16-2. File type codes used in advanced hexadecimal signatures
Code | File type description |
---|---|
0 | Any file |
1 | Portable executable |
2 | Windows OL2 component |
3 | Normalized HTML (see the next section) |
4 | |
5 | Graphic files (e.g., JPEG, etc.) |
To apply Dumb.Virus
to executables only, we can change the file type (second field) to 1. The new signature is:
Dumb.Virus.Exec:1:*:4920616d2061207669727573
ClamAV detects and normalizes HTML files. HTML is a very loose language; the same string can be written in many ways using HTML entities, JavaScript, encoded JavaScript, optional whitespaces, optional quote, and double quote, etc. Without a good normalization, it is virtually impossible to cover all the evasion techniques.
To see how ClamAV normalizes HTML files, run:
sigtool -html-normalize file.html
This command creates three files:
This is normalized HTML with the original HTML comments, if any.
This is the same as comment.html, but without the comments.
This is normalized and decoded JavaScript from the original HTML file.
Each signature that targets HTML files (file type 3) is run against the three files. For example, this signature looks for <title>I am a virus</title>
:
HTML.Dumb.Virus:3:*:3c7469746c653e4920616d20612076697275733c2f7469746c653e