How is data stored in CephFS?

To understand better how CephFS maps a POSIX-compatible filesystem over the top of an object store, we can look more closely at how Ceph maps file inodes to objects.

First, let's look at a file called test, which is stored on a CephFS filesystem mounted under /mnt/tmp. The following command uses the familiar Unix ls command, but with some extra parameters to show more details, including the file inode number:

ls -lhi /mnt/tmp/test

The following screenshot is the output of the preceding command:

The output shows that the file is 1 G in size and that the inode number is the long number at the far left.

Next, by listing the objects stored in the CephFS data pool and greping for that number, we can find the object responsible for holding the filesystem details for that file. Before we can proceed; however, we need to convert the inode number that is stored in decimal into hex, as that is how CephFS stores the inode numbers as object names:

printf "%x\n" 1099511784612

The following screenshot is the output of the preceding command:

Now we can find the object in the pool; note that this may take a long time on a CephFS pool with lots of data, as it will be listing every object in the background:

rados -p cephfs_data ls | grep 100000264a4 | wc -l

Note that 256 objects were found. By default, CephFS breaks larger files up into 4 MB objects, 256 of which would equal the size of the 1 G file.

The actual objects store the exact same data as the files viewable in the CephFS filesystem. If a text file is saved on a CephFS filesystem, its contents could be read by matching the underlying object to the inode number and using the rados command to download the object.

The cephfs_metadata pool stores all the metadata for the files stored on the CephFS filesystem; this includes values such as modified time, permissions, file names, and file locations in the directory tree. Without this metadata, the data objects stored in the data pool are literally just randomly-named objects; the data still exists but is fairly meaningless to human operators. The loss of CephFS metadata therefore does not lead to actual data loss, but still makes it more-or-less unreadable. Therefore, care should be taken to protect metadata pools just like any other RADOS pool in your Ceph cluster. There are some advanced recovery steps that may assist in metadata loss, which are covered in Chapter 12, Disaster Recovery.