Nonlinear Mappings: remap_file_pages()

File mappings created with mmap() are linear: there is a sequential, one-to-one correspondence between the pages of the mapped file and the pages of the memory region. For most applications, a linear mapping suffices. However, some applications need to create large numbers of nonlinear mappings—mappings where the pages of the file appear in a different order within contiguous memory. We show an example of a nonlinear mapping in Figure 49-5.

We described one way of creating nonlinear mappings in the previous section: using multiple calls to mmap() with the MAP_FIXED flag. However, this approach doesn’t scale well. The problem is that each of these mmap() calls creates a separate kernel virtual memory area (VMA) data structure. Each VMA takes time to set up and consumes some nonswappable kernel memory. Furthermore, the presence of a large number of VMAs can degrade the performance of the virtual memory manager; in particular, the time taken to process each page fault can significantly increase when there are tens of thousands of VMAs. (This was a problem for some large database management systems that maintain multiple different views in a database file.)

Note

Each line in the /proc/PID/maps file (Location of Shared Memory in Virtual Memory) represents one VMA.

From kernel 2.6 onward, Linux provides the remap_file_pages() system call to create nonlinear mappings without creating multiple VMAs. We do this as follows:

  1. Create a mapping with mmap().

  2. Use one or more calls to remap_file_pages() to rearrange the correspondence between the pages of memory and the pages of the file. (All that remap_file_pages() is doing is manipulating process page tables.)

#define _GNU_SOURCE
#include <sys/mman.h>

int remap_file_pages(void *addr, size_t size, int prot, size_t
 pgoff, int flags);

Note

Returns 0 on success, or -1 on error

The pgoff and size arguments identify a file region whose position in memory is to be changed. The pgoff argument specifies the start of the file region in units of the system page size (as returned by sysconf(_SC_PAGESIZE)). The size argument specifies the length of the file region, in bytes. The addr argument serves two purposes:

  • It identifies the existing mapping whose pages we want to rearrange. In other words, addr must be an address that falls somewhere within a region that was previously mapped with mmap().

  • It specifies the memory address at which the file pages identified by pgoff and size are to be located.

Both addr and size should be specified as multiples of the system page size. If they are not, they are rounded down to the nearest multiple of the page size.

Suppose that we use the following call to mmap() to map three pages of the open file referred to by the descriptor fd, and that the call assigns the returned address 0x4001a000 to addr:

ps = sysconf(_SC_PAGESIZE);               /* Obtain system page size */
addr = mmap(0, 3 * ps, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

The following calls would then create the nonlinear mapping shown in Figure 49-5:

remap_file_pages(addr, ps, 0, 2, 0);
                            /* Maps page 0 of file into page 2 of region */
remap_file_pages(addr + 2 * ps, ps, 0, 0, 0);
                            /* Maps page 2 of file into page 0 of region */
A nonlinear file mapping

Figure 49-5. A nonlinear file mapping

There are two other arguments to remap_file_pages() that we haven’t yet described:

As currently implemented, remap_file_pages() can be applied only to shared (MAP_SHARED) mappings.

The remap_file_pages() system call is Linux-specific; it is not specified in SUSv3 and is not available on other UNIX implementations.