Adding the ZFS cache to the pool (optional)

With Azure, every single VM has a temporary resource drive. The performance of this temporary resource drive is considerably higher than the data disks that are added to it. This drive is ephemeral, meaning the data is wiped once the VM is deallocated; this should work very well as a read cache drive since there is no need to persistently keep the data across reboots.

Since the drive is wiped with every stop/deallocate/start cycle, we need to tweak some things with the unit files for ZFS to allow the disk to be added on every reboot. The drive will always be /dev/sdb, and since there is no need to create a partition on it, we can simply tell ZFS to add it as a new disk each time the system boots.

This can be achieved by editing the systemd unit for zfs-mount.service, which is located under /usr/lib/systemd/system/zfs-mount.service. The problem with this approach is that the ZFS updates will overwrite the changes made to the preceding unit. One solution to this problem is to run sudo systemctl edit zfs-mount and add the following code:

[Service]
ExecStart=/sbin/zpool remove brick1 /dev/sdb

ExecStart=/sbin/zpool add brick1 cache /dev/sdb

To apply the changes, run the following command:

systemctl daemon-reload

Now that we have ensured that the cache drive will be added after every reboot, we need to change an Azure-specific configuration with the Linux agent that runs on Azure VMs. This agent is in charge of creating the temporary resource drive, and since we'll be using it for another purpose, we need to tell the agent not to create the ephemeral disk. To achieve this, we need to edit the file located in /etc/waagent.conf and look for the following line:

ResourceDisk.Format=y

You will then need to change it to the following line:

ResourceDisk.Format=n

After doing this, we can add the cache drive to the pool by running the following command:

zpool add brick1 cache /dev/sdb -f

The -f option must only be used the first time because it removes the previously created filesystem. Note that the stop/deallocate/start cycle of the VM is required to stop the agent from formatting the resource disk, as it gets an ext4 filesystem by default.

The previous process can also be applied to the newer Ls_v2 VMs, which use the much faster NVMe drives, such as the L8s_v2; simply replace /dev /sdb with /dev/nvme0n1.

You can verify that the cache disk was added as follows:

As we'll be using a single RAID group, this will be used as a read cache for the entire brick, allowing better performance when reading the files of the GlusterFS volume.