The yarp (short for Yet Another Registry Parser) library can be used to obtain keys and values from registry hives. Python provides a built-in registry module named _winreg; however, this module only works on Windows machines. The _winreg module interacts with the registry on the system running the module. It does not support opening external registry hives.
The yarp library allows us to interact with supplied registry hives and can be run on non-Windows machines. The yarp library can be downloaded from https://github.com/msuhanov/yarp. On the project's GitHub page, click on the releases section to see a list of all stable versions and download the desired version. For this chapter, we use version 1.0.25. Once the archived file is downloaded and extracted, we can run the included setup.py file to install the module. In a Command Prompt, execute the following code in the module's top-level directory:
python setup.py install
This should install the yarp library successfully on your machine. We can confirm by opening the Python interactive prompt and typing import yarp. We will receive an error if the module was not installed successfully. With yarp installed, let's begin learning how we can leverage this module for our needs.
First, we need to import the Registry class from the yarp module. Then, we use the RegistryHive function and pass it the registry object we want to query. In this example, we have copied the NTUSER.DAT registry file to our current working directory, which allows us to supply just the filename and not the path. Next, we use the find_key method to navigate to our key of interest. In this case, we are interested in the RecentDocs registry key. This key contains recent active files separated by extension:
>>> from yarp import Registry
>>> reg_file = open('NTUSER.DAT', 'rb') >>> reg = Registry.RegistryHive(reg_file) >>> recent_docs = reg.find_key('SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\RecentDocs')
If we print the recent_docs variable, we can see that it contains 151 values with 75 subkeys, which may contain additional values and subkeys. In addition, we can use the last_written_timestamp() method to see the last written time of the registry key:
>>> print(recent_docs) RegistryKey, name: RecentDocs, subkeys: 75, values: 151 >>> print(recent_docs.last_written_timestamp()) # Last Written Time datetime.datetime(2018, 11, 20, 3, 14, 40, 286516)
We can iterate over the values in the recent_docs key using the subkeys() function in a for loop. For each value, we can access the name(), value(), and values_count() methods, among others. When accessing a value (as opposed to a subkey), we can also access the value's raw data by using the raw_data() function. For our purposes, we use the raw_data() function when we want to work with the underlying binary data. We have the following code:
>>> for i, value in enumerate(recent_docs.subkeys()): ... print('{}) {}: {}'.format(i, value.name(), value.values_count())) ... 0) .001: 2
1) .1: 2
2) .7z: 2
3) .AAE: 2 ...
Another useful feature of the yarp module is a provided means of querying for a certain subkey or value. This is provided by the subkey(), value(), or find_key() functions. A None value is generated when a subkey is not present when using the subkey() function:
>>> if recent_docs.subkey('.docx'): ... print('Found docx subkey.') ... Found docx subkey. >>> if recent_docs.subkey('.1234abcd') is None: ... print('Did not find 1234abcd subkey.') ... None
The find_key() function takes a path and can find a subkey recursively through multiple levels. The subkey() and value() functions search only child elements. We can use these functions to confirm that a key or value exists before trying to navigate to them. yarp has a number of other relevant features not covered here, including recovering deleted registry keys and values, carving registry keys and values, and supporting transaction log files.
With the yarp module, finding keys and their values is straightforward. However, when the values are not strings and are instead binary data, we have to rely on another module to make sense of the mess. For all binary needs, the struct module is an excellent candidate.