To use raw flash chips for mass storage, you have to use a filesystem that understands the peculiarities of the underlying technology. There are three such filesystems:
All of these use MTD as the common interface to flash memory.
The Journaling Flash File System had its beginnings in the software for the Axis 2100 network camera in 1999. For many years, it was the only flash filesystem for Linux and has been deployed on many thousands of different types of devices. Today, it is not the best choice, but I will cover it first because it shows the beginning of the evolutionary path.
JFFS2 is a log-structured filesystem that uses MTD to access flash memory. In a log-structured filesystem, changes are written sequentially as nodes to the flash memory. A node may contain changes to a directory, such as the names of files created and deleted, or it may contain changes to file data. After a while, a node may be superseded by information contained in subsequent nodes and becomes an obsolete node.
Erase blocks are categorized into three types:
At any one time, there is one block receiving updates which is called the open block. If power is lost or the system is reset, the only data that can be lost is the last write to the open block. In addition, nodes are compressed as they are written, increasing the effective storage capacity of the flash chip, which is important if you are using expensive NOR flash memory.
When the number of free blocks falls below a threshold, a garbage collector kernel thread is started, which scans for dirty blocks and copies the valid nodes into the open block, and then frees up the dirty block.
At the same time, the garbage collector provides a crude form of wear leveling because it cycles valid data from one block to another. The way that the open block is chosen means that each block is erased roughly the same number of times so long as it contains data that changes from time to time. Sometimes a clean block is chosen for garbage collection to make sure that blocks containing static data that is seldom written are also wear leveled.
JFFS2 filesystems have a write through cache, meaning that writes are written to the flash memory synchronously as if they have been mounted with a -o sync
option. While improving reliability, it does increase the time to write data. There is a further problem with small writes: if the length of a write is comparable to the size of the node header (40 bytes) the overhead becomes high. A well-known corner case is log files, produced, for example, by syslogd.
There is one overriding disadvantage to JFFS2: since there is no on-chip index, the directory structure has to be deduced at mount-time by reading the log from start to finish. At the end of the scan, you have a complete picture of the directory structure of the valid nodes, but the time taken is proportional to the size of the partition. It is not uncommon to see mount times of the order of one second per megabyte, leading to total mount times of tens or hundreds of seconds.
To reduce the time to scan during mount, summary nodes became an option in Linux 2.6.15. A summary node is written at the end of the open erase block just before it is closed. The summary node contains all of the information needed for the mount-time scan, thereby reducing the amount of data to process during the scan. Summary nodes can reduce mount times by a factor of between two and five, at the expense of an overhead of about 5% of the storage space. They are enabled with the kernel configuration CONFIG_JFFS2_SUMMARY
.
An erased block with all bits set to 1 is indistinguishable from a block that has been written with 1's, but the latter has not had its memory cells refreshed and cannot be programmed again until it is erased. JFFS2 uses a mechanism called clean markers to distinguish between these two situations. After a successful block erase, a clean marker is written, either to the beginning of the block, or to the OOB area of the first page of the block. If the clean marker exists then it must be a clean block.
Creating an empty JFFS2 filesystem at runtime is as simple as erasing an MTD partition with clean markers and then mounting it. There is no formatting step because a blank JFFS2 filesystem consists entirely of free blocks. For example, to format MTD partition 6, you would enter these commands on the device:
# flash_erase -j /dev/mtd6 0 0 # mount -t jffs2 mtd6 /mnt
The -j
option to flash_erase
adds the clean markers, and mounting with type jffs2
presents the partition as an empty filesystem. Note that the device to be mounted is given as mtd6
, not /dev/mtd6
. Alternatively, you can give the block device node /dev/mtdblock6
. This is just a peculiarity of JFFS2. Once mounted, you can treat it like any filesystem and, when you next boot and mount it, all the files will still be there.
You can create a filesystem image directly from the staging area of your development system using mkfs.jffs2
to write out the files in JFFS2 format and sumtool
to add the summary nodes. Both of these are part of the mtd-utils
package.
As an example, to create an image of the files in rootfs
for a NAND flash device with an erase block size of 128 KB (0x20000) and with summary nodes, you would use these two commands:
$ mkfs.jffs2 -n -e 0x20000 -p -d ~/rootfs -o ~/rootfs.jffs2 $ sumtool -n -e 0x20000 -p -i ~/rootfs.jffs2 -o ~/rootfs-sum.jffs2
The -p
option adds padding at the end of the image file to make it a whole number of erase blocks. The -n
option suppresses the creation of clean markers in the image, which is normal for NAND devices as the clean marker is in the OOB area. For NOR devices, you would leave out the -n
option. You can use a device table with mkfs.jffs2
to set the permissions and the ownership of files by adding -D
[device table]. Of course, Buildroot and the Yocto Project will do all this for you.
You can program the image into flash memory from your bootloader. For example, if you have loaded a filesytem image into RAM at address 0x82000000 and you want to load it into a flash partition begins at 0x163000 bytes from the start of the flash chip and is 0x7a9d000 bytes long, the U-Boot commands would be:
nand erase clean 163000 7a9d000 nand write 82000000 163000 7a9d000
You can do the same thing from Linux using the mtd driver like this:
# flash_erase -j /dev/mtd6 0 0 # nandwrite /dev/mtd6 rootfs-sum.jffs2
To boot with a JFFS2 root filesystem, you need to pass the mtdblock
device on the kernel command line for the partition and a root fstype
because JFFS2 cannot be auto-detected:
root=/dev/mtdblock6 rootfstype=jffs2
The YAFFS filesystem was written by Charles Manning beginning in 2001, specifically to handle NAND flash chips at a time when JFFS2 did not. Subsequent changes to handle larger (2 KiB) page sizes resulted in YAFFS2. The website for YAFFS is http://www.yaffs.net.
YAFFS is also a log-structured filesystem following the same design principles as JFFS2. The different design decisions mean that it has a faster mount-time scan, simpler and faster garbage collection, and has no compression, which speeds up reads and writes at the expense of less efficient use of storage.
YAFFS is not limited to Linux; it has been ported to a wide range of operating systems. It has a dual license: GPLv2 to be compatible with Linux, and a commercial license for other operating systems. Unfortunately, the YAFFS code has never been merged into mainline Linux so you will have to patch your kernel, as shown in the following code.
To get YAFFS2 and patch a kernel, you would:
$ git clone git://www.aleph1.co.uk/yaffs2 $ cd yaffs2 $ ./patch-ker.sh c m <path to your link source>
Then, configure the kernel with CONFIG_YAFFS_YAFFS2
.
As with JFFS2, to create a YAFFS2 filesystem at runtime, you only need to erase the partition and mount it but note that, in this case, you do not enable clean markers:
# flash_erase /dev/mtd/mtd6 0 0 # mount -t yaffs2 /dev/mtdblock6 /mnt
To create a filesystem image, the simplest thing to do is use the mkyaffs2
tool from https://code.google.com/p/yaffs2utils using the following command:
$ mkyaffs2 -c 2048 -s 64 rootfs rootfs.yaffs2
Here -c
is the page size and -s
the OOB size. There is a tool named mkyaffs2image
that is part of the YAFFS code, but it has a couple of drawbacks. Firstly, the page and OOB size are hard-coded in the source: you will have to edit and recompile if you have memory that does not match the defaults of 2,048 and 64. Secondly, the OOB layout is incompatible with MTD, which uses the first two byes as a bad block marker, whereas mkyaffs2image
uses those bytes to store part of the YAFFS metadata.
To copy the image to the MTD partition from a Linux shell prompt, follow these steps:
# flash_erase /dev/mtd6 0 0 # nandwrite -a /dev/mtd6 rootfs.yaffs2
To boot with a YAFFS2 root filesystem, add the following to the kernel command line:
root=/dev/mtdblock6 rootfstype=yaffs2
The unsorted block image (UBI) driver, is a volume manager for flash memory which takes care of bad block handling and wear leveling. It was implemented by Artem Bityutskiy and first appeared in Linux 2.6.22. In parallel with that, engineers at Nokia were working on a filesystem that would take advantage of the features of UBI which they called UBIFS; it appeared in Linux 2.6.27. Splitting the flash translation layer in this way makes the code more modular and also allows other filesystems to take advantage of the UBI driver, as we shall see later on.
UBI provides an idealized, reliable view of a flash chip by mapping physical erase blocks (PEB) to logical erase blocks (LEB). Bad blocks are not mapped to LEBs and so are never used. If a block cannot be erased, it is marked as bad and dropped from the mapping. UBI keeps a count of the number of times each PEB has been erased in the header of the LEB and changes the mapping to ensure that each PEB is erased the same number of times.
UBI accesses the flash memory through the MTD layer. As an extra feature, it can divide an MTD partition into a number of UBI volumes, which improves wear leveling in the following way. Imagine that you have two filesystems, one containing fairly static data, for example, a root filesystem, and the other containing data that is constantly changing. If they are stored in separate MTD partitions, the wear leveling only has an effect on the second one, whereas, if you choose to store them in two UBI volumes in a single MTD partition, the wear leveling takes place over both areas of the storage and the lifetime of the flash memory is increased. The following diagram illustrates this situation:
In this way, UBI fulfills two of the requirements of a flash translation layer: wear leveling and bad block handling.
To prepare an MTD partition for UBI, you don't use flash_erase
as with JFFS2 and YAFFS2, instead you use the ubiformat
utility, which preserves the erase counts that are stored in the PED headers. ubiformat
needs to know the minimum unit of IO which, for most NAND flash chips, is the page size, but some chips allow reading and writing in sub pages that are a half or a quarter of the page size. Consult the chip data sheet for details and, if in doubt, use the page size. This example prepares mtd6
using a page size of 2,048 bytes:
# ubiformat /dev/mtd6 -s 2048
You use the ubiattach
command to load the UBI driver on an MTD partition that has been prepared in this way:
# ubiattach -p /dev/mtd6 -O 2048
This creates the device node /dev/ubi0
through which you can access the UBI volumes. You can use ubiattach
multiple times for other MTD partitions, in which case they can be accessed through /dev/ubi1
, /dev/ubi2
, and so on.
The PEB to LEB mapping is loaded into memory during the attach phase, a process that takes time proportional to the number of PEBs, typically a few seconds. A new feature was added in Linux 3.7 called the UBI fastmap which checkpoints the mapping to flash from time to time and so reduces the attach time. The kernel configuration option is CONFIG_MTD_UBI_FASTMAP
.
The first time you attach to an MTD partition after a ubiformat
there will be no volumes. You can create volumes using ubimkvol
. For example, suppose you have a 128MB MTD partition and you want to split it into two volumes of 32 MB and 96 MB using a chip with 128 KB erase blocks and 2 KB pages:
# ubimkvol /dev/ubi0 -N vol_1 -s 32MiB # ubimkvol /dev/ubi0 -N vol_2 -s 96MiB
Now, you have device the nodes /dev/ubi0_0
and /dev/ubi0_1
. You can confirm the situation using ubinfo
:
# ubinfo -a /dev/ubi0 ubi0 Volumes count: 2 Logical eraseblock size: 15360 bytes, 15.0 KiB Total amount of logical eraseblocks: 8192 (125829120 bytes, 120.0 MiB) Amount of available logical eraseblocks: 0 (0 bytes) Maximum count of volumes 89 Count of bad physical eraseblocks: 0 Count of reserved physical eraseblocks: 160 Current maximum erase counter value: 1 Minimum input/output unit size: 512 bytes Character device major/minor: 250:0 Present volumes: 0, 1 Volume ID: 0 (on ubi0) Type: dynamic Alignment: 1 Size: 2185 LEBs (33561600 bytes, 32.0 MiB) State: OK Name: vol_1 Character device major/minor: 250:1 ----------------------------------- Volume ID: 1 (on ubi0) Type: dynamic Alignment: 1 Size: 5843 LEBs (89748480 bytes, 85.6 MiB) State: OK Name: vol_2 Character device major/minor: 250:2
Note that, since each LEB has a header to contain the meta information used by UBI, the LEB is smaller than the PEB by one page. For example, a chip with a PEB size of 128 KB and 2 KB pages would have an LEB of 126 KB. This is important information that you will need when creating a UBIFS image.
UBIFS uses a UBI volume to create a robust filesystem. It adds sub-allocation and garbage collection to create a complete flash translation layer. Unlike JFFS2 and YAFFS2, it stores index information on-chip and so mounting is fast, although don't forget that attaching the UBI volume beforehand may take a significant amount of time. It also allows for write-back caching like a normal disk filesystem, which means that writes are much faster, but with the usual problem of potential loss of data that has not been flushed from the cache to flash memory in the event of power down. You can resolve the problem by making careful use of the fsync(2)
and fdatasync(2)
functions to force a flush of file data at crucial points.
UBIFS has a journal for fast recovery in the event power down. The journal takes up some space, typically 4 MiB or more, so UBIFS is not suitable for very small flash devices.
Once you have created the UBI volumes, you can mount them using the device node for the volume, /dev/ubi0_0
, or by using the device node for the whole partition plus the volume name, as shown here:
# mount -t ubifs ubi0:vol_1 /mnt
Creating a filesystem image for UBIFS is a two-stage process: first you create a UBIFS image using mkfs.ubifs
, and then embed it into a UBI volume using ubinize
.
For the first stage, mkfs.ubifs
needs to be informed of the page size with -m
, the size of the UBI LEB with -e
, remembering that the LEB is usually one page shorter than the PEB, and the maximum number of erase blocks in the volume with -c
. If the first volume is 32 MiB and an erase block is 128 KiB, then the number of erase blocks is 256. So, to take the contents of the directory rootfs and create a UBIFS image named rootfs.ubi
, you would type the following:
$ mkfs.ubifs -r rootfs -m 2048 -e 126KiB -c 256 -o rootfs.ubi
The second stage requires you to create a configuration file for ubinize
which describes the characteristics of each volume in the image. The help page (ubinize -h
) gives details of the format. This example creates two volumes, vol_1
and vol_2
:
[ubifsi_vol_1] mode=ubi image=rootfs.ubi vol_id=0 vol_name=vol_1 vol_size=32MiB vol_type=dynamic [ubifsi_vol_2] mode=ubi image=data.ubi vol_id=1 vol_name=vol_2 vol_type=dynamic vol_flags=autoresize
The second volume has an auto-resize flag and so will expand to fill the remaining space on the MTD partition. Only one volume can have this flag. From this information, ubinize
will create an image file named by the -o
parameter, with the PEB size -p
, the page size -m
, and the sub-page size -s
:
$ ubinize -o ~/ubi.img -p 128KiB -m 2048 -s 512 ubinize.cfg
To install this image on the target, you would enter these commands on the target:
# ubiformat /dev/mtd6 -s 2048 # nandwrite /dev/mtd6 /ubi.img # ubiattach -p /dev/mtd6 -O 2048
If you want to boot with a UBIFS root filesystem, you would give these kernel command line parameters:
ubi.mtd=6 root=ubi0:vol_1 rootfstype=ubifs