The system is a set of items that works collaboratively as part of a larger construction. Computer science utilizes many systems that manage router packets, operating functions, distributions, and other tasks. Each module of a system involves complex processes that require optimized data structures.
Objective 16.1 This chapter addresses applications of data structures to various system modules. It first covers queue-spilling algorithms for maintaining packet queues in routers and switches. Subsequent sections explain use of data structures in schedulers, distributed caching, and file system applications.
Iyer et al. [177] demonstrated that router and switch efficiency can be increased by using expensive SRAM, economical DRAM and queue-spilling algorithms. Queue-caching algorithms were proposed to manage unconsumed data that exceeded queue space and had to be spilled into secondary memory to be retrieved later. The huge amounts of data processed by the Internet make solution of the heuristic queue-spilling problem essential. Motwani et al. [178] proposed a model to manage queue spilling.
These algorithms and models work in a variety of data streaming systems. The proposed systems allow efficient management of queues in memory by using buffers for secondary storage. Caching queues in memory buffers are triggered whenever an certain number of data items (tuple) enters a queue. Only one tuple is consumed by the header because queues have first-in-first-out (FIFO) properties. Tuples are assumed to be of the same size M and n represents he number of queues available in the buffer. The process assumes that some unbounded tuples may need to be transmitted to main memory.
The algorithm can read the tuple from the secondary memory whenever required. The queue spilling algorithm should be able to decide in an on-line manner about the tuple state (read or write). The half algorithm is proposed for caching queues systems because it keeps the two active ends of the queue buffered in memory. The algorithm divides an unconsumed tuple into a head, spilled portion, and tail. The head contains the oldest tuple. The tail portion contains the most recent tuple. The tail and head reside in the main memory and the spilled portion resides on a disk.
Newly arrived tuples reside at the head segment; the tail is empty. The head portion expands as additional tuples arrive. The write operation on tuples continues until the queue is full, at which time the algorithm writes out
The proposed algorithm will ensure the following invariants invariants during read and write operations:
•Invariant 1: The tuples in the head are always older than the tuples in the spilled portion and the tail.
•Invariant 2: If the spilled portion is empty, the tail will also be empty.
•Invariant 3: The maximum size of the spilled portion can reach
The steps required to maintain these invariants [178] are as follows:
[Write − Out]: Once the tuples reach the
[Read − In]: If the head is emptied before the tail reaches
16.2 Completely Fair Schedulers in Kernels
A completely fair scheduler (CFS) maintains fairness among processes during running states [175]. Fairness provides execution time to processes in danger of becoming out of balance. A CFS maintains virtual run time (time needed for a specific task) and sleeper fairness to ensure that waiting after resumption of running state gets a fair share of processor operation.
A CFS uses time-ordered red-black trees rather than queues to maintain running states of processes.
Figure 16.1 illustrates a red-black tree that has specific properties: (1) it is self balance; no path in the tree will be used more than twice; and (2) insertion and deletion tasks require only O(log n) time.
Figure 16.1: Red-back tree stucture used with completely fair scheduler
Figure 16.1 represents the task as sched_entity objects stored in a red-black tree. Tasks with fewer processor needs are stored on the right side of the tree; tasks with more needs are stored on the left side. The scheduler selects the left-most nodes of the tree first, then the remaining left nodes.
A task in a link is handled by a task_struct function that represents all associated attributes (description, current state, stack order, static and dynamic priorities, and process flag).
Most of the task related structure can be found in a header file at./linux/include/linux/sched.h. No tasks related to the CFS will be run.
The root of the red-black tree in Figure 16.2 uses the rbroot element from the cfsrq structure in ./kernel/sched.c). The internode (rbnode) represents a runnable task. The leaves mean “no information.” Internal node(rbnode) resides within the schedentity structure. It includes the rbnode reference, load weight, and a variety of statistical data. The schedentity contains the vruntime (64-bit field), it indicates the amount of time occupied by the processor and index [175,176].
Figure 16.2: Relations of red-black tree structures
A distributed system is a collection of independent computers connected by some medium. Data caching provides solutions to many serious problems in distributed environments.
The hash-based bloom filter is used extensively to manage data caching in distributed environments.
Feng et al. [179] proposed a system using bloom filters to distribute data cache information. A summary cache system generates a question to determine whether another station’s cache holds desired data if a local cache misses a step. Communication and time costs for accessing data from the original station are reduced. To reduce message traffic, stations use periodic broadcasts of a bloom filter that represents a cache instead of transferring the entire content of the cache.
Each station verifies bloom filter of other stations for any data availability. False positivity and false negativity will trigger delays due to hash collisions caused by the bloom filter’s limited buffer size. Distributed caching has proven useful in Google’s BigTable, Google Maps, Google Earth, Web indexing and other distributed storage systems for structured data. These applications utilize bloom filters to reduce disk lookups. Summary cache systems are used extensively in cloud computing, mapping, and reduction paradigms. Bloom filters optimize reduction operations. Summary caches divide applications into small chunks to achieve parallel efficiency [180].
16.4 Data Structures for Building File Systems
Disk file systems use bitmaps to track free blocks and handle queries related to specific disk blocks. Disk files need good data structures to store directories and efficiently handle queries and fast lookups. Microsoft’s early FAT32 system used arrays for file allocations. The ext and ext2 systems use link lists. The XFS and NTFS systems use B+ trees for directory and security-related metadata indexing. The ext3 and ext4 file systems use modified B+ trees (also known as H trees) for file indexing [181].
Project 16.1 — Android Task Monitoring. Modern life demands completion of multiple tasks every day. Attending meetings, taking medications, paying bills, planning trips, going to classes and other activities represent daily challenges. The human brain is not designed to handle multiple tasks at the same time. Your project is to design a task alert system to broadcast a reminder whenever an important task must be done. The project should include a type of artificial intelligence assistant that will scan your android phone and create a schedule. The assistant can use interactive techniques and must ensure that you complete all tasks managed by the alert system.
Project 16.2 — Android Home Automation. Create a home automation project using an android phone. The major task of the system is to control home devices (lights, heat, alarms, air conditioning, appliances). You can choose any data structure to build your system.
Project 16.3 — File System. Create a file system using binary search trees and Linux. [Hint: Start by copying an existing file system to see how it works.]
Project 16.4 — Wearable Health Monitoring System. Create a health monitoring system using android phones. The objective is to maintain health records, coordinate appointments of multiple doctors, and schedule medicine intakes. The system should include update capabilities.
Project 16.5 — Firewall System. Create a signature-based firewall security system that will prevent unauthorized access to or from a private network. [Hint use the bloom filter to detect signature.]