The madvise() system call is used is to improve the performance of an application by informing the kernel about the calling process’s likely usage of the pages in the range starting at addr and continuing for length bytes. The kernel may use this information to improve the efficiency of I/O performed on the file mapping that underlies the pages. (See File Mappings for a discussion of file mappings.) On Linux, madvise() has been available since kernel 2.4.
#define _BSD_SOURCE
#include <sys/mman.h>
int madvise
(void *addr, size_t length, int advice);
Returns 0 on success, or -1 on error
The value specified in addr must be page-aligned, and length is effectively rounded up to the next multiple of the system page size. The advice argument is one of the following:
MADV_NORMAL
This is the default behavior. Pages are transferred in clusters (a small multiple of the system page size). This results in some read-ahead and read-behind.
MADV_RANDOM
Pages in this region will be accessed randomly, so read-ahead will yield no benefit. Thus, the kernel should fetch the minimum amount of data on each read.
MADV_SEQUENTIAL
Pages in this range will be accessed once, sequentially. Thus, the kernel can aggressively read ahead, and pages can be quickly freed after they have been accessed.
MADV_WILLNEED
Read pages in this region ahead, in preparation for future access. The MADV_WILLNEED
operation has an effect similar to the Linux-specific readahead() system call and the posix_fadvise() POSIX_FADV_WILLNEED
operation.
MADV_DONTNEED
The calling process no longer requires the pages in this region to be memory-resident. The precise effect of this flag varies across UNIX implementations. We first note the behavior on Linux. For a MAP_PRIVATE
region, the mapped pages are explicitly discarded, which means that modifications to the pages are lost. The virtual memory address range remains accessible, but the next access of each page will result in a page fault reinitializing the page, either with the contents of the file from which it is mapped or with zeros in the case of an anonymous mapping. This can be used as a means of explicitly reinitializing the contents of a MAP_PRIVATE
region. For a MAP_SHARED
region, the kernel may discard modified pages in some circumstances, depending on the architecture (this behavior doesn’t occur on x86). Some other UNIX implementations also behave in the same way as Linux. However, on some UNIX implementations, MADV_DONTNEED
simply informs the kernel that the specified pages can be swapped out if necessary. Portable applications should not rely on the Linux’s destructive semantics for MADV_DONTNEED
.
Linux 2.6.16 added three new nonstandard advice values: MADV_DONTFORK
, MADV_DOFORK
, and MADV_REMOVE
. Linux 2.6.32 and 2.6.33 added another four nonstandard advice values: MADV_HWPOISON
, MADV_SOFT_OFFLINE
, MADV_MERGEABLE
, and MADV_UNMERGEABLE
. These values are used in special circumstances and are described in the madvise(2) manual page.
Most UNIX implementations provide a version of madvise(), typically allowing at least the advice constants described above. However, SUSv3 standardizes this API under a different name, posix_madvise(), and prefixes the corresponding advice constants with the string POSIX_
. Thus, the constants are POSIX_MADV_NORMAL
, POSIX_MADV_RANDOM
, POSIX_MADV_SEQUENTIAL
, POSIX_MADV_WILLNEED
, and POSIX_MADV_DONTNEED
. This alternative interface is implemented in glibc (version 2.2 and later) by calls to madvise(), but it is not available on all UNIX implementations.
SUSv3 says that posix_madvise() should not affect the semantics of a program. However, in glibc versions before 2.7, the POSIX_MADV_DONTNEED
operation is implemented using madvise() MADV_DONTNEED
, which does affect the semantics of a program, as described earlier. Since glibc 2.7, the posix_madvise() wrapper implements POSIX_MADV_DONTNEED
to do nothing, so that it does not affect the semantics of a program.