Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Understanding the Linux Kernel, 3rd Edition
Preface
The Audience for This Book
Organization of the Material
Level of Description
Overview of the Book
Background Information
Conventions in This Book
How to Contact Us
SafariĀ® Enabled
Acknowledgments
1. Introduction
1.1. Linux Versus Other Unix-Like Kernels
1.2. Hardware Dependency
1.3. Linux Versions
1.4. Basic Operating System Concepts
1.4.1. Multiuser Systems
1.4.2. Users and Groups
1.4.3. Processes
1.4.4. Kernel Architecture
1.5. An Overview of the Unix Filesystem
1.5.1. Files
1.5.2. Hard and Soft Links
1.5.3. File Types
1.5.4. File Descriptor and Inode
1.5.5. Access Rights and File Mode
1.5.6. File-Handling System Calls
1.5.6.1. Opening a file
1.5.6.2. Accessing an opened file
1.5.6.3. Closing a file
1.5.6.4. Renaming and deleting a file
1.6. An Overview of Unix Kernels
1.6.1. The Process/Kernel Model
1.6.2. Process Implementation
1.6.3. Reentrant Kernels
1.6.4. Process Address Space
1.6.5. Synchronization and Critical Regions
1.6.5.1. Kernel preemption disabling
1.6.5.2. Interrupt disabling
1.6.5.3. Semaphores
1.6.5.4. Spin locks
1.6.5.5. Avoiding deadlocks
1.6.6. Signals and Interprocess Communication
1.6.7. Process Management
1.6.7.1. Zombie processes
1.6.7.2. Process groups and login sessions
1.6.8. Memory Management
1.6.8.1. Virtual memory
1.6.8.2. Random access memory usage
1.6.8.3. Kernel Memory Allocator
1.6.8.4. Process virtual address space handling
1.6.8.5. Caching
1.6.9. Device Drivers
2. Memory Addressing
2.1. Memory Addresses
2.2. Segmentation in Hardware
2.2.1. Segment Selectors and Segmentation Registers
2.2.2. Segment Descriptors
2.2.3. Fast Access to Segment Descriptors
2.2.4. Segmentation Unit
2.3. Segmentation in Linux
2.3.1. The Linux GDT
2.3.2. The Linux LDTs
2.4. Paging in Hardware
2.4.1. Regular Paging
2.4.2. Extended Paging
2.4.3. Hardware Protection Scheme
2.4.4. An Example of Regular Paging
2.4.5. The Physical Address Extension (PAE) Paging Mechanism
2.4.6. Paging for 64-bit Architectures
2.4.7. Hardware Cache
2.4.8. Translation Lookaside Buffers (TLB)
2.5. Paging in Linux
2.5.1. The Linear Address Fields
2.5.2. Page Table Handling
2.5.3. Physical Memory Layout
2.5.4. Process Page Tables
2.5.5. Kernel Page Tables
2.5.5.1. Provisional kernel Page Tables
2.5.5.2. Final kernel Page Table when RAM size is less than 896 MB
2.5.5.3. Final kernel Page Table when RAM size is between 896 MB and 4096 MB
2.5.5.4. Final kernel Page Table when RAM size is more than 4096 MB
2.5.6. Fix-Mapped Linear Addresses
2.5.7. Handling the Hardware Cache and the TLB
2.5.7.1. Handling the hardware cache
2.5.7.2. Handling the TLB
3. Processes
3.1. Processes, Lightweight Processes, and Threads
3.2. Process Descriptor
3.2.1. Process State
3.2.2. Identifying a Process
3.2.2.1. Process descriptors handling
3.2.2.2. Identifying the current process
3.2.2.3. Doubly linked lists
3.2.2.4. The process list
3.2.2.5. The lists of TASK_RUNNING processes
3.2.3. Relationships Among Processes
3.2.3.1. The pidhash table and chained lists
3.2.4. How Processes Are Organized
3.2.4.1. Wait queues
3.2.4.2. Handling wait queues
3.2.5. Process Resource Limits
3.3. Process Switch
3.3.1. Hardware Context
3.3.2. Task State Segment
3.3.2.1. The thread field
3.3.3. Performing the Process Switch
3.3.3.1. The switch_to macro
3.3.3.2. The _ _switch_to ( ) function
3.3.4. Saving and Loading the FPU, MMX, and XMM Registers
3.3.4.1. Saving the FPU registers
3.3.4.2. Loading the FPU registers
3.3.4.3. Using the FPU, MMX, and SSE/SSE2 units in Kernel Mode
3.4. Creating Processes
3.4.1. The clone( ), fork( ), and vfork( ) System Calls
3.4.1.1. The do_fork( ) function
3.4.1.2. The copy_process( ) function
3.4.2. Kernel Threads
3.4.2.1. Creating a kernel thread
3.4.2.2. Process 0
3.4.2.3. Process 1
3.4.2.4. Other kernel threads
3.5. Destroying Processes
3.5.1. Process Termination
3.5.1.1. The do_group_exit( ) function
3.5.1.2. The do_exit( ) function
3.5.2. Process Removal
4. Interrupts and Exceptions
4.1. The Role of Interrupt Signals
4.2. Interrupts and Exceptions
4.2.1. IRQs and Interrupts
4.2.1.1. The Advanced Programmable Interrupt Controller (APIC)
4.2.2. Exceptions
4.2.3. Interrupt Descriptor Table
4.2.4. Hardware Handling of Interrupts and Exceptions
4.3. Nested Execution of Exception and Interrupt Handlers
4.4. Initializing the Interrupt Descriptor Table
4.4.1. Interrupt, Trap, and System Gates
4.4.2. Preliminary Initialization of the IDT
4.5. Exception Handling
4.5.1. Saving the Registers for the Exception Handler
4.5.2. Entering and Leaving the Exception Handler
4.6. Interrupt Handling
4.6.1. I/O Interrupt Handling
4.6.1.1. Interrupt vectors
4.6.1.2. IRQ data structures
4.6.1.3. IRQ distribution in multiprocessor systems
4.6.1.4. Multiple Kernel Mode stacks
4.6.1.5. Saving the registers for the interrupt handler
4.6.1.6. The do_IRQ( ) function
4.6.1.7. The _ _do_IRQ( ) function
4.6.1.8. Reviving a lost interrupt
4.6.1.9. Interrupt service routines
4.6.1.10. Dynamic allocation of IRQ lines
4.6.2. Interprocessor Interrupt Handling
4.7. Softirqs and Tasklets
4.7.1. Softirqs
4.7.1.1. Data structures used for softirqs
4.7.1.2. Handling softirqs
4.7.1.3. The do_softirq( ) function
4.7.1.4. The _ _do_softirq( ) function
4.7.1.5. The ksoftirqd kernel threads
4.7.2. Tasklets
4.8. Work Queues
4.8.1.
4.8.1.1. Work queue data structures
4.8.1.2. Work queue functions
4.8.1.3. The predefined work queue
4.9. Returning from Interrupts and Exceptions
4.9.1.
4.9.1.1. The entry points
4.9.1.2. Resuming a kernel control path
4.9.1.3. Checking for kernel preemption
4.9.1.4. Resuming a User Mode program
4.9.1.5. Checking for rescheduling
4.9.1.6. Handling pending signals, virtual-8086 mode, and single stepping
5. Kernel Synchronization
5.1. How the Kernel Services Requests
5.1.1. Kernel Preemption
5.1.2. When Synchronization Is Necessary
5.1.3. When Synchronization Is Not Necessary
5.2. Synchronization Primitives
5.2.1. Per-CPU Variables
5.2.2. Atomic Operations
5.2.3. Optimization and Memory Barriers
5.2.4. Spin Locks
5.2.4.1. The spin_lock macro with kernel preemption
5.2.4.2. The spin_lock macro without kernel preemption
5.2.4.3. The spin_unlock macro
5.2.5. Read/Write Spin Locks
5.2.5.1. Getting and releasing a lock for reading
5.2.5.2. Getting and releasing a lock for writing
5.2.6. Seqlocks
5.2.7. Read-Copy Update (RCU)
5.2.8. Semaphores
5.2.8.1. Getting and releasing semaphores
5.2.9. Read/Write Semaphores
5.2.10. Completions
5.2.11. Local Interrupt Disabling
5.2.12. Disabling and Enabling Deferrable Functions
5.3. Synchronizing Accesses to Kernel Data Structures
5.3.1. Choosing Among Spin Locks, Semaphores, and Interrupt Disabling
5.3.1.1. Protecting a data structure accessed by exceptions
5.3.1.2. Protecting a data structure accessed by interrupts
5.3.1.3. Protecting a data structure accessed by deferrable functions
5.3.1.4. Protecting a data structure accessed by exceptions and interrupts
5.3.1.5. Protecting a data structure accessed by exceptions and deferrable functions
5.3.1.6. Protecting a data structure accessed by interrupts and deferrable functions
5.3.1.7. Protecting a data structure accessed by exceptions, interrupts, and deferrable functions
5.4. Examples of Race Condition Prevention
5.4.1. Reference Counters
5.4.2. The Big Kernel Lock
5.4.3. Memory Descriptor Read/Write Semaphore
5.4.4. Slab Cache List Semaphore
5.4.5. Inode Semaphore
6. Timing Measurements
6.1. Clock and Timer Circuits
6.1.1. Real Time Clock (RTC)
6.1.2. Time Stamp Counter (TSC)
6.1.3. Programmable Interval Timer (PIT)
6.1.4. CPU Local Timer
6.1.5. High Precision Event Timer (HPET)
6.1.6. ACPI Power Management Timer
6.2. The Linux Timekeeping Architecture
6.2.1. Data Structures of the Timekeeping Architecture
6.2.1.1. The timer object
6.2.1.2. The jiffies variable
6.2.1.3. The xtime variable
6.2.2. Timekeeping Architecture in Uniprocessor Systems
6.2.2.1. Initialization phase
6.2.2.2. The timer interrupt handler
6.2.3. Timekeeping Architecture in Multiprocessor Systems
6.2.3.1. Initialization phase
6.2.3.2. The global timer interrupt handler
6.2.3.3. The local timer interrupt handler
6.3. Updating the Time and Date
6.4. Updating System Statistics
6.4.1. Updating Local CPU Statistics
6.4.2. Keeping Track of System Load
6.4.3. Profiling the Kernel Code
6.4.4. Checking the NMI Watchdogs
6.5. Software Timers and Delay Functions
6.5.1. Dynamic Timers
6.5.1.1. Dynamic timers and race conditions
6.5.1.2. Data structures for dynamic timers
6.5.1.3. Dynamic timer handling
6.5.2. An Application of Dynamic Timers: the nanosleep( ) System Call
6.5.3. Delay Functions
6.6. System Calls Related to Timing Measurements
6.6.1. The time( ) and gettimeofday( ) System Calls
6.6.2. The adjtimex( ) System Call
6.6.3. The setitimer( ) and alarm( ) System Calls
6.6.4. System Calls for POSIX Timers
7. Process Scheduling
7.1. Scheduling Policy
7.1.1. Process Preemption
7.1.2. How Long Must a Quantum Last?
7.2. The Scheduling Algorithm
7.2.1. Scheduling of Conventional Processes
7.2.1.1. Base time quantum
7.2.1.2. Dynamic priority and average sleep time
7.2.1.3. Active and expired processes
7.2.2. Scheduling of Real-Time Processes
7.3. Data Structures Used by the Scheduler
7.3.1. The runqueue Data Structure
7.3.2. Process Descriptor
7.4. Functions Used by the Scheduler
7.4.1. The scheduler_tick( ) Function
7.4.1.1. Updating the time slice of a real-time process
7.4.1.2. Updating the time slice of a conventional process
7.4.2. The try_to_wake_up( ) Function
7.4.3. The recalc_task_prio( ) Function
7.4.4. The schedule( ) Function
7.4.4.1. Direct invocation
7.4.4.2. Lazy invocation
7.4.4.3. Actions performed by schedule( ) before a process switch
7.4.4.4. Actions performed by schedule( ) to make the process switch
7.4.4.5. Actions performed by schedule( ) after a process switch
7.5. Runqueue Balancing in Multiprocessor Systems
7.5.1. Scheduling Domains
7.5.2. The rebalance_tick( ) Function
7.5.3. The load_balance( ) Function
7.5.4. The move_tasks( ) Function
7.6. System Calls Related to Scheduling
7.6.1. The nice( ) System Call
7.6.2. The getpriority( ) and setpriority( ) System Calls
7.6.3. The sched_getaffinity( ) and sched_setaffinity( ) System Calls
7.6.4. System Calls Related to Real-Time Processes
7.6.4.1. The sched_getscheduler( ) and sched_setscheduler( ) system calls
7.6.4.2. The sched_ getparam( ) and sched_setparam( ) system calls
7.6.4.3. The sched_ yield( ) system call
7.6.4.4. The sched_ get_priority_min( ) and sched_ get_priority_max( ) system calls
7.6.4.5. The sched_rr_ get_interval( ) system call
8. Memory Management
8.1. Page Frame Management
8.1.1. Page Descriptors
8.1.2. Non-Uniform Memory Access (NUMA)
8.1.3. Memory Zones
8.1.4. The Pool of Reserved Page Frames
8.1.5. The Zoned Page Frame Allocator
8.1.5.1. Requesting and releasing page frames
8.1.6. Kernel Mappings of High-Memory Page Frames
8.1.6.1. Permanent kernel mappings
8.1.6.2. Temporary kernel mappings
8.1.7. The Buddy System Algorithm
8.1.7.1. Data structures
8.1.7.2. Allocating a block
8.1.7.3. Freeing a block
8.1.8. The Per-CPU Page Frame Cache
8.1.8.1. Allocating page frames through the per-CPU page frame caches
8.1.8.2. Releasing page frames to the per-CPU page frame caches
8.1.9. The Zone Allocator
8.1.9.1. Releasing a group of page frames
8.2. Memory Area Management
8.2.1. The Slab Allocator
8.2.2. Cache Descriptor
8.2.3. Slab Descriptor
8.2.4. General and Specific Caches
8.2.5. Interfacing the Slab Allocator with the Zoned Page Frame Allocator
8.2.6. Allocating a Slab to a Cache
8.2.7. Releasing a Slab from a Cache
8.2.8. Object Descriptor
8.2.9. Aligning Objects in Memory
8.2.10. Slab Coloring
8.2.11. Local Caches of Free Slab Objects
8.2.12. Allocating a Slab Object
8.2.13. Freeing a Slab Object
8.2.14. General Purpose Objects
8.2.15. Memory Pools
8.3. Noncontiguous Memory Area Management
8.3.1. Linear Addresses of Noncontiguous Memory Areas
8.3.2. Descriptors of Noncontiguous Memory Areas
8.3.3. Allocating a Noncontiguous Memory Area
8.3.4. Releasing a Noncontiguous Memory Area
9. Process Address Space
9.1. The Process's Address Space
9.2. The Memory Descriptor
9.2.1. Memory Descriptor of Kernel Threads
9.3. Memory Regions
9.3.1. Memory Region Data Structures
9.3.2. Memory Region Access Rights
9.3.3. Memory Region Handling
9.3.3.1. Finding the closest region to a given address: find_vma( )
9.3.3.2. Finding a region that overlaps a given interval: find_vma_intersection( )
9.3.3.3. Finding a free interval: get_unmapped_area( )
9.3.3.4. Inserting a region in the memory descriptor list: insert_vm_struct( )
9.3.4. Allocating a Linear Address Interval
9.3.5. Releasing a Linear Address Interval
9.3.5.1. The do_munmap( ) function
9.3.5.2. The split_vma( ) function
9.3.5.3. The unmap_region( ) function
9.4. Page Fault Exception Handler
9.4.1. Handling a Faulty Address Outside the Address Space
9.4.2. Handling a Faulty Address Inside the Address Space
9.4.3. Demand Paging
9.4.4. Copy On Write
9.4.5. Handling Noncontiguous Memory Area Accesses
9.5. Creating and Deleting a Process Address Space
9.5.1. Creating a Process Address Space
9.5.2. Deleting a Process Address Space
9.6. Managing the Heap
10. System Calls
10.1. POSIX APIs and System Calls
10.2. System Call Handler and Service Routines
10.3. Entering and Exiting a System Call
10.3.1. Issuing a System Call via the int $0x80 Instruction
10.3.1.1. The system_call( ) function
10.3.1.2. Exiting from the system call
10.3.2. Issuing a System Call via the sysenter Instruction
10.3.2.1. The sysenter instruction
10.3.2.2. The vsyscall page
10.3.2.3. Entering the system call
10.3.2.4. Exiting from the system call
10.3.2.5. The sysexit instruction
10.3.2.6. The SYSENTER_RETURN code
10.4. Parameter Passing
10.4.1. Verifying the Parameters
10.4.2. Accessing the Process Address Space
10.4.3. Dynamic Address Checking: The Fix-up Code
10.4.4. The Exception Tables
10.4.5. Generating the Exception Tables and the Fixup Code
10.5. Kernel Wrapper Routines
11. Signals
11.1. The Role of Signals
11.1.1. Actions Performed upon Delivering a Signal
11.1.2. POSIX Signals and Multithreaded Applications
11.1.3. Data Structures Associated with Signals
11.1.3.1. The signal descriptor and the signal handler descriptor
11.1.3.2. The sigaction data structure
11.1.3.3. The pending signal queues
11.1.4. Operations on Signal Data Structures
11.2. Generating a Signal
11.2.1. The specific_send_sig_info( ) Function
11.2.2. The send_signal( ) Function
11.2.3. The group_send_sig_info( ) Function
11.3. Delivering a Signal
11.3.1. Executing the Default Action for the Signal
11.3.2. Catching the Signal
11.3.2.1. Setting up the frame
11.3.2.2. Evaluating the signal flags
11.3.2.3. Starting the signal handler
11.3.2.4. Terminating the signal handler
11.3.3. Reexecution of System Calls
11.3.3.1. Restarting a system call interrupted by a non-caught signal
11.3.3.2. Restarting a system call for a caught signal
11.4. System Calls Related to Signal Handling
11.4.1. The kill( ) System Call
11.4.2. The tkill( ) and tgkill( ) System Calls
11.4.3. Changing a Signal Action
11.4.4. Examining the Pending Blocked Signals
11.4.5. Modifying the Set of Blocked Signals
11.4.6. Suspending the Process
11.4.7. System Calls for Real-Time Signals
12. The Virtual Filesystem
12.1. The Role of the Virtual Filesystem (VFS)
12.1.1. The Common File Model
12.1.2. System Calls Handled by the VFS
12.2. VFS Data Structures
12.2.1. Superblock Objects
12.2.2. Inode Objects
12.2.3. File Objects
12.2.4. dentry Objects
12.2.5. The dentry Cache
12.2.6. Files Associated with a Process
12.3. Filesystem Types
12.3.1. Special Filesystems
12.3.2. Filesystem Type Registration
12.4. Filesystem Handling
12.4.1. Namespaces
12.4.2. Filesystem Mounting
12.4.3. Mounting a Generic Filesystem
12.4.3.1. The do_kern_mount( ) function
12.4.3.2. Allocating a superblock object
12.4.4. Mounting the Root Filesystem
12.4.4.1. Phase 1: Mounting the rootfs filesystem
12.4.4.2. Phase 2: Mounting the real root filesystem
12.4.5. Unmounting a Filesystem
12.5. Pathname Lookup
12.5.1. Standard Pathname Lookup
12.5.2. Parent Pathname Lookup
12.5.3. Lookup of Symbolic Links
12.6. Implementations of VFS System Calls
12.6.1. The open( ) System Call
12.6.2. The read( ) and write( ) System Calls
12.6.3. The close( ) System Call
12.7. File Locking
12.7.1. Linux File Locking
12.7.2. File-Locking Data Structures
12.7.3. FL_FLOCK Locks
12.7.4. FL_POSIX Locks
13. I/O Architecture and Device Drivers
13.1. I/O Architecture
13.1.1. I/O Ports
13.1.1.1. Accessing I/O ports
13.1.2. I/O Interfaces
13.1.2.1. Custom I/O interfaces
13.1.2.2. General-purpose I/O interfaces
13.1.3. Device Controllers
13.2. The Device Driver Model
13.2.1. The sysfs Filesystem
13.2.2. Kobjects
13.2.2.1. Kobjects, ksets, and subsystems
13.2.2.2. Registering kobjects, ksets, and subsystems
13.2.3. Components of the Device Driver Model
13.2.3.1. Devices
13.2.3.2. Drivers
13.2.3.3. Buses
13.2.3.4. Classes
13.3. Device Files
13.3.1. User Mode Handling of Device Files
13.3.1.1. Dynamic device number assignment
13.3.1.2. Dynamic device file creation
13.3.2. VFS Handling of Device Files
13.4. Device Drivers
13.4.1. Device Driver Registration
13.4.2. Device Driver Initialization
13.4.3. Monitoring I/O Operations
13.4.3.1. Polling mode
13.4.3.2. Interrupt mode
13.4.4. Accessing the I/O Shared Memory
13.4.5. Direct Memory Access (DMA)
13.4.5.1. Synchronous and asynchronous DMA
13.4.5.2. Helper functions for DMA transfers
13.4.5.3. Bus addresses
13.4.5.4. Cache coherency
13.4.5.5. Helper functions for coherent DMA mappings
13.4.5.6. Helper functions for streaming DMA mappings
13.4.6. Levels of Kernel Support
13.5. Character Device Drivers
13.5.1. Assigning Device Numbers
13.5.1.1. The register_chrdev_region( ) and alloc_chrdev_region( ) functions
13.5.1.2. The register_chrdev( ) function
13.5.2. Accessing a Character Device Driver
13.5.3. Buffering Strategies for Character Devices
14. Block Device Drivers
14.1. Block Devices Handling
14.1.1. Sectors
14.1.2. Blocks
14.1.3. Segments
14.2. The Generic Block Layer
14.2.1. The Bio Structure
14.2.2. Representing Disks and Disk Partitions
14.2.3. Submitting a Request
14.3. The I/O Scheduler
14.3.1. Request Queue Descriptors
14.3.2. Request Descriptors
14.3.2.1. Managing the allocation of request descriptors
14.3.2.2. Avoiding request queue congestion
14.3.3. Activating the Block Device Driver
14.3.4. I/O Scheduling Algorithms
14.3.4.1. The "Noop" elevator
14.3.4.2. The "CFQ" elevator
14.3.4.3. The "Deadline" elevator
14.3.4.4. The "Anticipatory" elevator
14.3.5. Issuing a Request to the I/O Scheduler
14.3.5.1. The blk_queue_bounce( ) function
14.4. Block Device Drivers
14.4.1. Block Devices
14.4.1.1. Accessing a block device
14.4.2. Device Driver Registration and Initialization
14.4.2.1. Defining a custom driver descriptor
14.4.2.2. Initializing the custom descriptor
14.4.2.3. Initializing the gendisk descriptor
14.4.2.4. Initializing the table of block device methods
14.4.2.5. Allocating and initializing a request queue
14.4.2.6. Setting up the interrupt handler
14.4.2.7. Registering the disk
14.4.3. The Strategy Routine
14.4.4. The Interrupt Handler
14.5. Opening a Block Device File
15. The Page Cache
15.1. The Page Cache
15.1.1. The address_space Object
15.1.2. The Radix Tree
15.1.3. Page Cache Handling Functions
15.1.3.1. Finding a page
15.1.3.2. Adding a page
15.1.3.3. Removing a page
15.1.3.4. Updating a page
15.1.4. The Tags of the Radix Tree
15.2. Storing Blocks in the Page Cache
15.2.1. Block Buffers and Buffer Heads
15.2.2. Managing the Buffer Heads
15.2.3. Buffer Pages
15.2.4. Allocating Block Device Buffer Pages
15.2.5. Releasing Block Device Buffer Pages
15.2.6. Searching Blocks in the Page Cache
15.2.6.1. The _ _find_get_block( ) function
15.2.6.2. The _ _getblk( ) function
15.2.6.3. The _ _bread( ) function
15.2.7. Submitting Buffer Heads to the Generic Block Layer
15.2.7.1. The submit_bh( ) function
15.2.7.2. The ll_rw_block( ) function
15.3. Writing Dirty Pages to Disk
15.3.1. The pdflush Kernel Threads
15.3.2. Looking for Dirty Pages To Be Flushed
15.3.3. Retrieving Old Dirty Pages
15.4. The sync( ), fsync( ), and fdatasync( ) System Calls
15.4.1. The sync ( ) System Call
15.4.2. The fsync ( ) and fdatasync ( ) System Calls
16. Accessing Files
16.1. Reading and Writing a File
16.1.1. Reading from a File
16.1.1.1. The readpage method for regular files
16.1.1.2. The readpage method for block device files
16.1.2. Read-Ahead of Files
16.1.2.1. The page_cache_readahead( ) function
16.1.2.2. The handle_ra_miss( ) function
16.1.3. Writing to a File
16.1.3.1. The prepare_write and commit_write methods for regular files
16.1.3.2. The prepare_write and commit_write methods for block device files
16.1.4. Writing Dirty Pages to Disk
16.2. Memory Mapping
16.2.1. Memory Mapping Data Structures
16.2.2. Creating a Memory Mapping
16.2.3. Destroying a Memory Mapping
16.2.4. Demand Paging for Memory Mapping
16.2.5. Flushing Dirty Memory Mapping Pages to Disk
16.2.6. Non-Linear Memory Mappings
16.3. Direct I/O Transfers
16.4. Asynchronous I/O
16.4.1. Asynchronous I/O in Linux 2.6
16.4.1.1. The asynchronous I/O context
16.4.1.2. Submitting the asynchronous I/O operations
17. Page Frame Reclaiming
17.1. The Page Frame Reclaiming Algorithm
17.1.1. Selecting a Target Page
17.1.2. Design of the PFRA
17.2. Reverse Mapping
17.2.1. Reverse Mapping for Anonymous Pages
17.2.1.1. The try_to_unmap_anon( ) function
17.2.1.2. The try_to_unmap_one( ) function
17.2.2. Reverse Mapping for Mapped Pages
17.2.2.1. The priority search tree
17.2.2.2. The try_to_unmap_file( ) function
17.3. Implementing the PFRA
17.3.1. The Least Recently Used (LRU) Lists
17.3.1.1. Moving pages across the LRU lists
17.3.1.2. The mark_page_accessed( ) function
17.3.1.3. The page_referenced( ) function
17.3.1.4. The refill_inactive_zone( ) function
17.3.2. Low On Memory Reclaiming
17.3.2.1. The free_more_memory( ) function
17.3.2.2. The try_to_free_pages( ) function
17.3.2.3. The shrink_caches( ) function
17.3.2.4. The shrink_zone( ) function
17.3.2.5. The shrink_cache( ) function
17.3.2.6. The shrink_list( ) function
17.3.2.7. The pageout( ) function
17.3.3. Reclaiming Pages of Shrinkable Disk Caches
17.3.3.1. Reclaiming page frames from the dentry cache
17.3.3.2. Reclaiming page frames from the inode cache
17.3.4. Periodic Reclaiming
17.3.4.1. The kswapd kernel threads
17.3.4.2. The cache_reap( ) function
17.3.5. The Out of Memory Killer
17.3.6. The Swap Token
17.4. Swapping
17.4.1. Swap Area
17.4.1.1. Creating and activating a swap area
17.4.1.2. How to distribute pages in the swap areas
17.4.2. Swap Area Descriptor
17.4.3. Swapped-Out Page Identifier
17.4.4. Activating and Deactivating a Swap Area
17.4.4.1. The sys_swapon( ) service routine
17.4.4.2. The sys_swapoff( ) service routine
17.4.4.3. The try_to_unuse( ) function
17.4.5. Allocating and Releasing a Page Slot
17.4.5.1. The scan_swap_map( ) function
17.4.5.2. The get_swap_page( ) function
17.4.5.3. The swap_free( ) function
17.4.6. The Swap Cache
17.4.6.1. Swap cache implementation
17.4.6.2. Swap cache helper functions
17.4.7. Swapping Out Pages
17.4.7.1. Inserting the page frame in the swap cache
17.4.7.2. Updating the Page Table entries
17.4.7.3. Writing the page into the swap area
17.4.7.4. Removing the page frame from the swap cache
17.4.8. Swapping in Pages
17.4.8.1. The do_swap_page( ) function
17.4.8.2. The read_swap_cache_async( ) function
18. The Ext2 and Ext3 Filesystems
18.1. General Characteristics of Ext2
18.2. Ext2 Disk Data Structures
18.2.1. Superblock
18.2.2. Group Descriptor and Bitmap
18.2.3. Inode Table
18.2.4. Extended Attributes of an Inode
18.2.5. Access Control Lists
18.2.6. How Various File Types Use Disk Blocks
18.2.6.1. Regular file
18.2.6.2. Directory
18.2.6.3. Symbolic link
18.2.6.4. Device file, pipe, and socket
18.3. Ext2 Memory Data Structures
18.3.1. The Ext2 Superblock Object
18.3.2. The Ext2 inode Object
18.4. Creating the Ext2 Filesystem
18.5. Ext2 Methods
18.5.1. Ext2 Superblock Operations
18.5.2. Ext2 inode Operations
18.5.3. Ext2 File Operations
18.6. Managing Ext2 Disk Space
18.6.1. Creating inodes
18.6.2. Deleting inodes
18.6.3. Data Blocks Addressing
18.6.4. File Holes
18.6.5. Allocating a Data Block
18.6.6. Releasing a Data Block
18.7. The Ext3 Filesystem
18.7.1. Journaling Filesystems
18.7.2. The Ext3 Journaling Filesystem
18.7.3. The Journaling Block Device Layer
18.7.3.1. Log records
18.7.3.2. Atomic operation handles
18.7.3.3. Transactions
18.7.4. How Journaling Works
19. Process Communication
19.1. Pipes
19.1.1. Using a Pipe
19.1.2. Pipe Data Structures
19.1.2.1. The pipefs special filesystem
19.1.3. Creating and Destroying a Pipe
19.1.4. Reading from a Pipe
19.1.5. Writing into a Pipe
19.2. FIFOs
19.2.1. Creating and Opening a FIFO
19.3. System V IPC
19.3.1. Using an IPC Resource
19.3.2. The ipc( ) System Call
19.3.3. IPC Semaphores
19.3.3.1. Undoable semaphore operations
19.3.3.2. The queue of pending requests
19.3.4. IPC Messages
19.3.5. IPC Shared Memory
19.3.5.1. Swapping out pages of IPC shared memory regions
19.3.5.2. Demand paging for IPC shared memory regions
19.4. POSIX Message Queues
20. Program ExZecution
20.1. Executable Files
20.1.1. Process Credentials and Capabilities
20.1.1.1. Process capabilities
20.1.1.2. The Linux Security Modules framework
20.1.2. Command-Line Arguments and Shell Environment
20.1.3. Libraries
20.1.4. Program Segments and Process Memory Regions
20.1.4.1. Flexible memory region layout
20.1.5. Execution Tracing
20.2. Executable Formats
20.3. Execution Domains
20.4. The exec Functions
A. System Startup
A.1. Prehistoric Age: the BIOS
A.2. Ancient Age: the Boot Loader
A.2.1. Booting Linux from a Disk
A.3. Middle Ages: the setup( ) Function
A.4. Renaissance: the startup_32( ) Functions
A.5. Modern Age: the start_kernel( ) Function
B. Modules
B.1. To Be (a Module) or Not to Be?
B.1.1. Module Licenses
B.2. Module Implementation
B.2.1. Module Usage Counters
B.2.2. Exporting Symbols
B.2.3. Module Dependency
B.3. Linking and Unlinking Modules
B.4. Linking Modules on Demand
B.4.1. The modprobe Program
B.4.2. The request_module( ) Function
C. Bibliography
Books on Unix Kernels
Books on the Linux Kernel
Books on PC Architecture and Technical Manuals on Intel Microprocessors
Other Online Documentation Sources
Research Papers Related to Linux Development
About the Authors
Colophon
Copyright
← Prev
Back
Next →
← Prev
Back
Next →