The WAL header

The 32-byte WAL header contains properties such as the page size, number of checkpoints, size of the WAL file, and indirectly, number of frames in the WAL file. The following table details the byte offset and description of the 8 big-endian 32-bit integers stored in the header:

Byte offset

Value

Description

0-3

File signature

This is either 0x377F0682 or 0x377F0683.

4-7

File version

This is the WAL format version, which is currently 3007000.

8-11

Database page size

This is the size of the page within the database, which is usually 1024 or 4096.

12-15

Checkpoint number

This is the number of commits that have occurred.

16-19

Salt-1

This is a random integer that is incremented by one with each commit.

20-23

Salt-2

This is a random integer that changes with each commit.

24-27

Checksum-1

This is the first part of the header checksum.

28-31

Checksum-2

This is the second part of header checksum.

 

The file signature should always be either 0x377F0682 or 0x377F0683. The database page size is a very important value as this allows us to calculate how many frames are present in the WAL file. For example, there are 5 frames in a 20,632 byte WAL file using 4,096 byte pages. To calculate the number of frames properly, we need to account for the 32 byte WAL header and the 24-byte WAL frame header in the following equation:

(WAL File Size - 32) / (WAL Page Size + 24) 
20,600 / 4,120 = 5 frames 

The checkpoint number indicates how many commits have been triggered, either automatically, or manually by executing PRAGMA wal_checkpoint. Now, let's focus on the Salt-1 value. When it comes to creating a timeline of database activity, this is the most important value in the header. The Salt-1 value increments with each commit. In addition to that, each frame stores the current salt values in its own header at the time of the commit. If a record was modified and recommitted, the newer record would have a larger Salt-1 value than the previous version of the record. Therefore, we might have multiple snapshots of a given record in time within the WAL file.

Let's pretend we have a database containing one table, storing data related to employee names, positions, salaries, and so on. Early on, we have an entry for Peter Parker, a 23-year old freelance photographer making $45,000. A few commits later, Parker's salary changes to $150,000 and within the same commit Parker's name is updated to Spiderman:

Frame

Salt-1

Row ID

Employee name

Position

Salary

0

-977652151

123

Spiderman?

Freelance

150,000

1

-977652151

123

Peter Parker

Freelance

150,000

2

-977652150

123

Peter Parker

Freelance

45,000

 

Because these entries share the same Row ID, we know that we're dealing with three different versions of record 123 in the main table. To identify the most recent version of this record, we need to examine the Salt-1 value. Based on our discussion earlier and the Salt-1 values of the records, we know that the records in Frame 0 and 1 are the most recent records and that there have been two commits since the record was first added to the database.

How do we know which of the records in Frames 0 and 1 is the most recent? Dealing with the scenario where we have two records in the same commit, the one in an earlier frame is regarded as the most recent. This is because the WAL file adds new frames to the beginning of the file rather than the end. Therefore, the record in Frame 0 is the most recent and the record in Frame 2 is the oldest.

Note that you can have more than one record per frame. Newer records are found at the beginning of the frame.

In the database, we'll only see the most recent version of the record, but in the WAL file, we can see previous versions. As long as the WAL file exists, we would still see this information, even if the record with Row ID of 123 is deleted from the main database.