Objective 7.1 — Temporal Data Structures. We usually deal with data structures in traditional algorithmic settings. Updates are handled by modifying data or their underlying pointers. Information is lost because those processes do not retain previous data states. Temporal data structures preserve information from previous states so it is available for access.
In this chapter we discuss temporal data structures, which will allow us to view and/or modify past and present data structures. The specific behaviors of these data structures are defined below.
There are two primary models of temporal data structures. The first, called persistence, is based on the branching-universe model of time travel. In this model, going back in time and making changes creates a new branch of the data structure that differs from the original branch. The second, called retroactivity, works on the idea of round-trip time travel. Here, a time traveler goes back in time, makes a change, and then returns to observe the effects of his or her change. This model gives us a linear timeline with no branching.
We have vaguely referred to persistence as the ability to answer queries about the past states of the structure. Here we give several definitions of persistence.
In this persistence model we may query any previous version of the data structure, but we may only update the latest version. The relevant operations are read(var, version) and newversion = write(var, val).
Can we use time travel models to review past data structures? ▪
A persistent data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the structure in-place, but instead always yield a new updated structure. The term was introduced in 1986 by Driscoll et al. [86].
A data structure is partially persistent if all previous versions of the same data structure can be accessed for viewing (read only mode) but only the current version can be updated. The data structure is fully persistent if view and modification are possible in all the versions. If there is a special combination operation that can create a new version from two previous versions of the data structure, the system is a confluent persistent data structure. Data structures that are not persistent are ephemeral.
Persistent data structures are used in functional programming. Further, in the case of purely functional programs, all data is immutable, therefore all data structures are automatically fully persistent. Persistent data structures can also be created using in-place updating of data and these may, in general, use less time or storage space than their purely functional counterparts. Purely functional data structures are persistent in that they avoid the use of mutable state, but can still achieve attractive amortized time complexity bounds.
Definition 7.1.1 — Purely functional language. A purely functional language is a programming technique which does not allow any delete operation but can overwrite data by invoking a function call that returns newly computed data.
While persistence can be achieved by simple copying, this is inefficient in CPU and RAM usage, because most operations make only small changes to a data structure. A better method is to exploit the similarity between the new and old versions and share structures between them, such as using the same subtree in a number of tree structures. A garbage collection feature may be necessary if determining the number of previous versions that share parts of a structure is not feasible the user wishes to discard old versions. However, a sophisticated system such as the ZFS copy-on-write model can perform these tasks by directly tracking storage allocation.
In the partial persistence model (Figure 7.1), we may query any previous version of the data structure, but we may only update the latest version. This implies a linear ordering among the versions.
Figure 7.1: Partial persistence
7.1.1.1 Fat node method
This method records all changes made to nodes in node fields without erasing the old values of the fields. This requires that we allow nodes to become arbitrarily “fat”. In other words, each fat node contains the same information and pointer fields as an ephemeral nodes, along with space for an arbitrary number of extra field values. Each extra field value has an associated field name and a version stamp which indicates the version in which the named field was changed to have the specified value. Besides, each fat node has its own version stamp, indicating the version in which the node was created. The only purpose of nodes having version stamps is to make sure that each node only contains one value per field name per version. In order to navigate through the structure, each original field value in a node has a version stamp of zero.
Complexity 7.1.1 — Fat node method. Use of the fat node requires O(a) space for every modification. Just store the new data. Each modification takes O(1) additional time to store the modification at the end of the modification history. This is an amortized time bound, assuming we store the modification history in a growable array. For access time, we must find the right version at each node as we traverse the structure. If we made m modifications, then each access operation has O(log m) slowdown resulting from the cost of finding the nearest modification in the array.
7.1.1.2 Path copying
Path copy duplicates all nodes on the path containing the node about to be inserted or deleted. We must first cascade the change back through the data structure: all nodes that pointed to the old node must be modified to point to the new node instead. These modifications cause more cascading changes, and so on, until we reach the root. We maintain an array of roots indexed by timestamp. The data structure pointed to by time t’s root is exactly time t’s data structure.
Complexity 7.1.2 — Path copying method. The m modifications cost O(log m) lookup time. Modification time and space are bounded by the size of the structure, since a single modification may cause the entire structure to be copied. That is O(m) for one update, and thus O(n2) preprocessing time.
7.1.1.3 A combination of fat node and path copying
Sleator and Tarjan (7) devised a way to combine the advantages of fat nodes and path copying, getting O(1) access slowdown and O(1) modification space and time.
In each node, we store one modification box. This box can hold one modification to a node (to a pointer, a key, another piece of node-specific data) along with a timestamp of the modification. Initially, every nodes modification box is empty.
Whenever we access a node, we check the modification box, and compare its timestamp against the access time. (The access time specifies the version of the data structure that we care about.) If the modification box is empty, or the access time is before the modification time, then we ignore the modification box and just deal with the normal part of the node. On the other hand, if the access time is after the modification time, then we use the value in the modification box, overriding that value in the node. (Say the modification box has a new left pointer. Then we’ll use it instead of the normal left pointer, but we’ll still use the normal right pointer.)
Modifying a node works like this. We assume that each modification touches one pointer or similar field. If the node’s modification box is empty, then we fill it with the modification. Otherwise, the modification box is full. We make a copy of the node, but using only the latest values. That is, we overwrite one of the node’s fields with the value that was stored in the modification box. Then we perform the modification directly on the new node, without using the modification box. We overwrite one of the new nodes fields, and its modification box stays empty. Finally, we cascade this change to the nodes parent, just like path copying. This may involve filling the parents modification box, or making a copy of the parent recursively. If the node has no parent-its the root-we add the new root to a sorted array of roots.
With this algorithm, given any time t, at most one modification box exists in the data structure with time t. Thus, a modification at time t splits the tree into three parts: one part contains the data from before time t, one part contains the data from after time t, and one part was unaffected by the modification.
7.1.1.4 Complexity of combination
Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1) amortized time. To see why, use a potential function Φ, where Φ(T) is the number of full live nodes in T. The live nodes of T are just the nodes that are reachable from the current root at the current time (that is, after the last modification). The modification boxes of full live nodes are full.
Each modification involves some number of copies, say k, followed by one change to a modification box. You could add a new root but that would not change the argument. Consider each of the k copies. Each costs O(1) space and time, but decreases the potential function by one. First, the node we copy must be full and live, so it contributes to the potential function. The potential function will only drop, however, if the old node isnt reachable in the new tree. Since the node is not reachable in the new tree, the next step in the algorithm will be to modify the nodes parent to point at the copy. Finally, we know the copys modification box is empty. Thus, weve replaced a full live node with an empty live node, and Φ goes down by one. The final step fills a modification box, which costs O(1) time and increases Φ by one.
Putting it all together, the change in Φ is ΔΦ = 1 − k. Thus, weve paid O(k + ΔΦ) = O(1) space and O(k + ΔΦ + 1) = O(1) time.
7.1.2 Full persistence (see Figure 7.2)
In this model (Figure 7.1), both updates and queries are allowed on any version of the data structure. We have operations read(var, version) and newversion = write(var, version, val).
In this model (Figure 7.3) in addition to the previous operation, combination operations merge the inputs of previous versions into a single output version. We have operations read(var, version), newversion = write(var, version, val) and newversion = combine(var, val, version1, version2). Rather than a branching tree, combinations of versions induce a DAG (direct acyclic graph) structure on the version graph.
Figure 7.2: Full persistence
Figure 7.3: Confluent persistence
This model takes its name from functional programming where objects are immutable (Figure 7.4). The nodes in this model are likewise immutable: revisions do not alter the existing nodes in the data structure but create new ones instead. The difference between functional persistence and the other types is the need to keep all the structures related to previous versions intact: the only allowed internal operation is to add new nodes. The three previous types of persistence were far more flexible as long as we were able to implement the interface. Each of the succeeding levels of persistence is stronger than the preceding ones. Functional implies confluent, confluent implies full, and full implies partial. Functional implies confluent because we are simply restricting ways to implement persistence. Confluent persistence becomes full persistence if we do not use combinators. Full persistence when we limit our writing to the latest version.
Figure 7.4: Different versions (Part A, Part B, Part C) of data structure in functional persistence
A retroactive data structure (Figure 7.5) supports simple operations such as insertion (t, update), deletion (t), and query (t, query). An uppercase insert indicate an operation on a retroactive data structure. A lowercase update denotes an operation on the current data structure. Think of time t as an integer but a better approach is to use an order maintenance data structure to avoid using non-integers (in case you want to insert an operation between times t and t + 1).
Figure 7.5: Retroactivity
7.2.1 Decomposable search problem
This is similar to a simple search problem but requires that queries must satisfy the following equation:
▪ Example 7.1 — Priority queue. Priority queues are data structures in which retroactive operations potentially create chain reactions but still produce acceptable results. The main operations are insert and delete-min which we would like to retroactively Insert and Delete. It is possible to implement a partially retroactive priority queue with only O(log n) overhead per partially retroactive operation. Because of the presence of delete-min, the set of operations on priority queues is non-commutative. The order of updates now clearly matters, and inserting a delete-min retroactively has the potential to cause a chain reaction which changes subsequent results. ▪
Summary 7.1 — Temporal data structure. Temporal data structures are used to create timestamped indices of data and are useful in applications relevant to time travel operations.
In this chapter we have discussed basics of temporal data structures. Interested readers are referred to Okasaki’s book on functional data structures [88].
Exercise 7.1 Given an ordered universe U of keys, develop and analyze a fully retroactive data structure that maintains S ⊆ U and supports the following operations:
•insert
(k) : Insert
k ∈ U into
S
•delete
(k) : Remove
k ∈ U from
S
•successor
(k) : Return
min{k′ ∈ S | k′ ≥ k}
under the constraint that all insert
operations must occur at time −∞. All operations should run in time O(log m), where m is the total number of updates performed in the structure (retroactive or not). Observe that such a structure is sufficient to answer the “rightward ray shot” queries needed for the nonoblivious retroactive priority queue. ▪
Exercise 7.2 Create persistence versions of the following data structures:
1.Array
2.Link list
3.Binary search tree
4.Skiplist
5.Trie ▪