Chapter 7. Multiple Directories

In the previous chapters we were focused on using a single directory server. But in a networked environment, you may need to configure multiple directory servers to interoperate. In this chapter we will be looking at different ways of getting directory servers to interoperate over a network.

While the focus of this book is OpenLDAP, many of the strategies presented here can be adopted to integrate OpenLDAP with other LDAP directory servers, such as The Apache Directory Server, Fedora DS, Microsoft's Active Directory, and the Novell Directory Server (NDS).

The two main processes we will look at are replication (creating a mirror of a directory information tree on another directory server) and proxying (allowing one directory server to act as an intermediary between an LDAP client and another directory server). In this chapter we will cover:

The basics of synchronizing and replicating directories
Directory replication with SyncRepl
Proxying with the ldap backend
Adding caching with the Proxy Cache overlay
Using the transparency overlay to create a hybrid cache

In this chapter we will be working with two servers—one that will host the authoritative copy of the directory, and another that will synchronize itself over the network with the authoritative copy.

Replication: An Overview

Sometimes it is desirable to have multiple identical copies of a directory server. This can be particularly effective in cases where LDAP servers sustain large volumes of traffic, where fail-over protection is required, or in cases where LDAP clients are geographically dispersed, and having local copies of a directory would greatly expedite service. These are cases where LDAP replication can provide a solution.

Replication is the process of configuring two or more directories to contain the same directory information tree (or portion of the directory tree), and to keep the multiple copies of the directory data synchronized over time. This has been a central feature of the OpenLDAP suite since its inception. In fact its predecessor, the University of Michigan LDAP Server, implemented replication early on and, because of this, replication has long been considered a standard task for an LDAP server.

In the standard LDAP model, replication is done in a hierarchy. One server is considered the master server (or the master DSA (Directory Server Agent), sometimes called the provider). This server is responsible for maintaining the canonical version of the directory information tree.

Beneath the master server are one or more shadow servers (sometimes called consumer, replica, or slave servers ). A shadow server holds a replica of the master server's directory information tree, and clients can connect to the shadow server and perform searches of the directory information tree (DIT). Let's have a look at the following figure:

For all practical purposes, shadow servers have read-only features. While the shadow servers can handle many LDAP operations, shadow servers are not allowed to alter the records in the replicated directory information tree. When add, modify, or delete operations are received, for instance, the shadow server will return a referral to the client, directing it to contact the master server instead. A referral is a special type of response that directs the client to contact another server to perform that operation. Configuring a referral to point from a shadow server to a master server is a simple matter of adding a referral directive to the slapd.conf file.

When a client receives a referral it has the information it needs to re-try the operation on the correct server.

Why not allow writing to the slaves? Allowing multiple servers to accept all the modifications, additions, and deletions makes it possible for the directory information tree to taken on inconsistent states. What happens if two directory servers change an attribute at the same time? Or if one modifies a record that another is simultaneously deleting? By allowing write operations only on the master server, it is much easier to keep the many replicas consistent.

Note

In the 2.4 release of OpenLDAP, it will be possible to configure multimaster which will allow multiple servers to act as masters. As with all multimaster configurations, there will be risks that certain inconsistencies arise, but these risks should be minimized.

In OpenLDAP there are two different ways to implement replication. The first is by configuring the master server to keep the shadow servers updated. This is called the push method. The second is to configure the slaves to periodically check the master for changes, and update itself accordingly; this is called the pull method.

Until OpenLDAP 2.2, the first model was the only model supported in OpenLDAP, and it was done through a stand-alone server called SLURPD. But SLURPD suffered from a number of problems and inefficiencies, and is now deprecated. It will be removed from OpenLDAP 2.4. If you are interested in using it to retain backward compatibility see the OpenLDAP Administrator's Guide at http://openldap.org.

As SLURPD aged the OpenLDAP developers began working on a better, more robust way of replicating directories. The result was the new Syncrhonization-Replication (SyncRepl) model, which uses the LDAP synchronization protocol to keep shadow directories synchronized with a master server.

SyncRepl

In OpenLDAP 2.2, the developers released a new, experimental form of replication called LDAP Synchronization-Replication, or SyncRepl for short. This method was both more reliable and more configurable, and it was further refined and designated stable when OpenLDAP 2.3 was released. It is now the preferred way of handling replication for OpenLDAP servers.

Unlike the SLURPD replication process, SyncRepl does not require a second daemon process. The SLAPD server implements the shadow server portion of the code, and the provider services (for the master server) are provided in an overlay. SyncRepl can use either a shadow-from-master pull or a hybrid pull/push method.

In the pull scenario (called the refresh-only operation), the shadow server periodically connects to the master server and requests all changes since the last time it checked. The master then sends the shadow all changed records (or, in the case of deletions, the DN of the deleted record).

SynRepl's second method (called refresh and persist) is a hybrid of the push features exemplified in the SLURPD model and some of the pull features discussed above (therefore it is not a true push method).

In this scenario, the shadow server makes an initial connection to the master and pulls some initial updates. But it leaves the connection open. When the master modifies its copy of the directory information tree, it pushes information to the shadow server using that open connection. If the shadow server gets disconnected, the master server does nothing. The next time the slave server connects though, it requests all new changes (like in the pull method), and the master sends them.

Note

For more detailed information about LDAP content synchronization and the SyncRepl implementation included in OpenLDAP, see RFC 4533 (http://www.rfc-editor.org/rfc/rfc4533.txt) and the OpenLDAP Administrator's Guide at http://openldap.org.

The SyncRepl model has some distinct advantages over SLURPD:

Since the shadow server initiates connections and handles updates, a network outage does not cause problems with the reliability of the directory information tree. The next time the slave can get back on line it will retrieve all of its updates.
There is rarely any need to interrupt service on the master server. When a new shadow server first connects to the master it downloads the entire directory information tree, so there is no need to dump the data from the master server and send it to the shadow (though that method is still supported, and it might be expedient in cases where there is a large directory information tree and a slow network connection).
The flexibility in choosing between the refresh-only and refresh-and-persist operations gives you the ability to choose a model that will best match your needs.

Each mode of replication has its advantages. In highly distributed networks, the refresh-only replication tends to work better as it doesn't require keeping a constant connection open across a large and unpredictable network. But since the shadow server only checks the master periodically, there can be a lag between when the master is updated and when the shadow picks up the changes. Most times this does not cause any problems.

On a reliable LAN, the refresh-and-persist (refreshAndPersist) replication may be a better choice—especially if it is important that changes get from the master to the shadow in a minimal amount of time. As soon as the master is changed, it will send the updates to the shadow server. This means that there is less waiting time.

Note

Even in the refresh-and-persist (refreshAndPersist) operation, a network outage is not catastrophic. The shadow server will simply attempt to re-connect to the master server, retrieving updates as soon as it successfully connects.

These comments are intended to serve as general guidelines. Since it is fairly easy to try both, you may want to experiment to see what works best for you. Generally, on a LAN, refresh-and-persist is the best choice, while on slower links, refresh-only is better. In the next section we will cover the process of configuring SyncRepl between a master and a shadow copy.