15    Browsing

Web browsing is a common experience of computing. We’ll consider the mechanics of a browser displaying the Google home page. When you’re reading this, there might not be a service called Google any more or its home page might have changed radically. But the core of the Google home page has been stable for many years, so it seems like a good bet that it will still be recognizable in the future. We’ll make two important simplifications in our description:

  1. 1. We’ll just look at the behavior that’s common to pretty much every browser, so we won’t try to talk about special features or unusual behavior that some particular browser might have.
  2. 2. We’ll pretend that all of Google is built using just a single computer. This description would have been accurate in the very early days of the web, when browsers were simple and Google was small. We’ll circle back in chapter 20 to fill in a little more of what we are omitting in this initial sketch.

The simple user perspective of browsing the web is that you point your browser at Google and see the home page. Here’s a step-by-step version:

  1. 1. The browser sends a home-page request to Google
  2. 2. Google sends a home page reply back to the browser
  3. 3. The browser uses the home page information received to present a visual representation of the Google home page

You might think of the web as apps or pages. Those are actually higher-level assemblies of the underlying “stuff” of the web. If we understand something about how that works, we’ll have better insight into any failures that might occur. We’ll also have a better idea of how we can (or can’t) modify existing web services to achieve some new goal.

If we want to understand what really makes up the web at a lower level, there are two key ideas: resources and servers.

•  A resource is a pretty flexible concept: it can be a picture, a movie, some music, some text pretty much anything that can be stored. Importantly, a resource can also be some kind of program that executes: executing that program produces some output result like a picture, a movie, some music, some text, and so forth.

•  We encountered servers before when we considered virtualization (chapter 12), but a server on the web is a little different. On the web, a server is just some collection of resources—it’s the place where you find some resource(s) of interest.

So if we’re asked “what is the web, really?” an accurate—if unhelpful—answer is “it’s a collection of resources on servers.”

Resources are not simply stored on servers like piles of bricks: instead, each resource acts as though it were a kind of machine itself, with some functions or buttons that we can use to get it to do something. The commands that can be applied to a resource are called methods. Depending on the particular method and the particular resource, there may be some kind of result from carrying out the method on the resource; such a result is called an entity.

Using the elements that we now know are behind the scenes, our previous apparently simple example of browsing to the Google home page can be divided into the following steps:

  1. 1. Find the server for the Google home-page resource
  2. 2. Send a “fetching” method to that home-page resource
  3. 3. Execute the method on the resource to produce an entity
  4. 4. Return the entity as the result of the request
  5. 5. Use the information in that entity to present a visual representation of the Google home page.

That sure seems like a lot of steps for just displaying a home page, when our first (simpler) list seemed like a perfectly fine solution. Why should there be all of this complexity? The extra parts and pieces don’t make much difference for a simple case like getting the Google home page, but those options are a part of what makes the web so flexible. That flexibility is important for supporting billions of users and trillions of items of interest. There are a lot of stages in the browser/server interaction process at which it’s possible to substitute a sophisticated choice or computation, where this simple example just does something straightforward.

This example helps us start to see any browser in a different light. A browser essentially consists of two kinds of machinery stuck together. One kind of machinery knows how to display different kinds of entities and thus knows how to interpret and render the “Google home-page entity” as the familiar-looking logo and search box. The other kind of machinery knows how to interact with the networked computers of the internet so as to find resources, send them methods, and receive any resulting entities.

Accordingly, we can distinguish the user-facing “application-like” part of a browser from the network-facing “infrastructure-like” part of a browser (see figure 15.1).

Figure 15.1

The user-facing and network-facing parts of a browser.

Programs in Browsers

The “application-like” part of a browser even includes the ability to run programs, which is an important component in the way that many web-accessible services present themselves. Confusingly, the programs running in browsers are often called “scripts,” which can leave the impression that they are somehow different from real programs. In fact, a browser functions like just another kind of step-taking machinery. So a browser is also a program for running programs, some of those programs coming from remote sites. As you might recall, “operating system” is the name we usually give to a program for running programs. Unlikely as it might seem, every modern browser has effectively stumbled into being a kind of operating system. Even stranger, the browser effectively participates in a weakly defined and evolving distributed operating system for programs that span servers and other browsers. This is not how most people think about the web—and yet that’s what it has become.

However, we’ll choose to focus here on the simple operations of browsing. Accordingly, we won’t consider these sophisticated distributed-programming aspects further. Instead, our subsequent discussion will focus primarily on the browser’s machinery for interacting with resources on the network.

Naming Resources

Returning to our simple example, let’s consider naming resources. A resource doesn’t really need a name: a resource can certainly exist on the web without having a name. But as we saw in our previous discussions about names (chapter 4) a name can be handy if we want to communicate the identity of something, or otherwise refer to it.

The most common way of referring to a resource is to use a Uniform Resource Locator, or URL. The URL we’re interested in when we think about browsing to the Google home page is written in full as

http://www.google.com/

We’ll start by assuming that a browser is just a means of fetching various things from servers, with the Google home page being one of those things.

(Naturally, focusing only on fetching the Google home page is a very limited perspective. The first thing a person would typically do with that home page after it’s displayed is to type one or more search terms into it. We’ll turn to the question of how that works a little later in this chapter. For now, it’s simpler to just look at fetching.)

To understand how the browser “engine” converts a given URL into a corresponding entity, it’s helpful to understand what the different parts of a URL mean. The URL consists of a scheme and a path. The scheme is the first part, “http:” in this case. The path is everything after the scheme. The letters “http” specify the most common web protocol, the Hypertext Transfer Protocol (HTTP). In networking, a protocol is a set of message formats and their associated meanings. The protocol is the shared vocabulary for how the browser and server will talk. We’ll encounter many more protocols when we look more at networking in chapters 18 and 19.

Even if you have been paying close attention to URLs, most likely you’ve only ever seen ones that start with “http:” or “https:” as the scheme. You may not even have noticed the single-character difference between those two. Historically, “https:” schemes were used on pages that involve information that must be kept secret, like passwords or credit card numbers, while “http:” schemes are used in other places where there isn’t as much concern about hiding information. But after the Snowden revelations about activities of the U.S. National Security Agency, many websites shifted to using the more secure approach for more kinds of information. In addition to the protocol, the scheme also determines how the remainder of the URL (the path) is interpreted to identify that server and some resource on it.

We can separate the interpretation of the path into two pieces: how the path is used as a name for a resource, and how the server for that resource is contacted. As it happens, the “http:” and “https:” schemes have the same rules for how to interpret the paths, but they differ in how the browser contacts the server. The “https:” scheme uses a more secure (but also more costly) mechanism to send information across the network. We examine the issues of protecting information in chapter 24. For now, we’ll look only at ordinary “http:” URLs. We know from the scheme that we’ll eventually contact a server using HTTP, and we haven’t yet examined what that means; but before we learn what it means to “speak” HTTP, we have to figure out what we’re “speaking” to, and that requires understanding the path.

Recall that we’re considering the URL

http://www.google.com/

Now let’s look more closely at the path (that is, the part after the “http:” scheme):

//www.google.com/

The two slashes at the beginning mark the start of the name of a server. So this path effectively says “a server named ‘www.google.com’ then the resource named ‘/’.” As we’ll see in a while, the server name and the resource name have some interesting similarities and differences. We’ll start by looking at the resource name (“/”) since it’s smaller; we’ll use what we learn in that examination to help explain how server names work.

Hierarchical Names

Our first observation is that “/” seems like a strange name for a resource. However, that name makes sense once you understand the naming system. There are levels that are “higher” or “bigger” and levels that are “lower” or “smaller.” The levels are separated by slashes. The bigger things are on the left, the smaller things are on the right:

/Biggest/Big/Medium/Small/Tiny

Computer scientists refer to this kind of naming as hierarchical. We often use hierarchical naming systems in everyday life without thinking about them in those terms. For example, if you were sending a letter to the U.S. President, you would need to write a postal address like this on the envelope:

1600 Pennsylvania Avenue, Washington, District of Columbia, United States

We could take that same mailing address and render it into the “path vocabulary” like this:

/UnitedStates/DistrictOfColumbia/Washington/PennsylvaniaAvenue/1600

Our original postal address had spaces separating the words. One of the quirks of the path vocabulary in URLs is that it doesn’t allow spaces, so our converted postal address leaves them out. That change makes familiar phrases look SortOfAwkward, but you soon get used to reading that style of writing.

Figure 15.2 shows this address, as well as some other things we could talk about in this example micro-world.

Figure 15.2

A hierarchy identifying some American locations.

In addition to the “DistrictOfColumbia” we also have “Ohio” and “California” as elements below “UnitedStates.” And within California we have the cities of “SanFrancisco” and “LosAngeles.”

Although this example uses street addresses, the slash-separated hierarchical names are applicable to many different settings. In principle, such a name can be as long as you want. Any time you want to make some finer distinctions, you can add another level of names—although in practice it starts to get tedious to use really long names with many different levels.

Shorter Names

One of the merits of hierarchical structuring is that we don’t have to always use the full version of a name. If we have some environment that defines a large, more general part of the name, we can identify specific items in that environment by only providing the other (small, more specific) part of the name.

For an example of one of these environments, we can shorten the hierarchical name we used before, leaving off the specific address of the White House. This form instead identifies all the addresses on Pennsylvania Avenue:

/UnitedStates/DistrictOfColumbia/Washington/PennsylvaniaAvenue/

or we can use something even shorter to talk about all the addresses in the District of Columbia:

/UnitedStates/DistrictOfColumbia/

Importantly, we can change our starting point for understanding names. We can think in terms of navigating the diagram. The very beginning of this addressing scheme is also the very top of the branching diagram, but we can start further down—as though we have already followed one of these shorter names from the top. For example, if we already know that we are concerned only with names of entities in Washington, DC, we can omit the first part of the path (/UnitedStates/DistrictOfColumbia/Washington). We can think of starting at the “Washington” spot in the earlier branching diagram, rather than starting at the very top. If we know that we’re starting at Washington, the address of the White House becomes

PennsylvaniaAvenue/1600

while a different address otherwise on the exact same street is

PennsylvaniaAvenue/1850

Notice that both of these addresses don’t start with a single slash.

The single slash on its own “/” is a name for the top starting point—what computer scientists would call the root of the hierarchy. In this particular example of postal addresses, “/” roughly corresponds to “Earth.”

Editing URLs

You might have found a particular book called Widgets at a publisher’s website and seen that the URL was something like:

//example.com/books/Widgets

If you’re curious about other books published by this same publisher, it’s reasonable to try the shorter URL:

//example.com/books

It might not work, or it might produce something different from what you expect. But it works the way you expect often enough to be useful. The defining standards for URLs say that any URL should be opaque, handled as just a bunch of characters with no internal structure or meaning. You aren’t supposed to see or take advantage of the hierarchy that we just described. But I use this URL-shortening and editing all the time, and that behavior is widespread. When you navigate to some hierarchical URL and you find that you’re interested in more than that single item, it’s handy if you can shorten the URL and try to fetch something there.

Genuinely opaque URLs that just have some long unintelligible string of characters are much less helpful when you’re trying to find something. Technically, a URL should still operate in exactly the same way if it included characters mixed up from completely different character systems like Greek, Chinese, and Hindi—but no human user would find that convenient.

When the web was first invented, URLs were assumed to be opaque with the idea that searching would be used to find relevant URLs. With the benefit of years of experience, we can say that that theory is correct but incomplete. We do search to find relevant URLs, but we also sometimes use URL construction or modification.

Naming Servers

Now, let’s return to the double slash “//” and what immediately follows it: “www.google.com.” We’ve said that’s the name of the server; but what does it mean to be the name of a server, and how does this naming work? Let’s first look at the syntax of server names, and then we can examine the way that names lead us to servers.

Recall that in this context a server is a computer that’s prepared to answer a request for a resource. There are many different servers on the web, so we need to identify the one of interest—typically by using its name. The name may be ambiguous: with people, there might be more than one “John Smith,” and similarly for servers there might be more than one “www.google.com.” In spite of those possible ambiguities, a relatively short, readable-by-people textual name is still the handiest way to refer to someone or something with relatively little chance of confusion.

Why does the server name have dots, and why are the “www” and “com” parts there? In some ways those elements seem unnecessary. We can already tell that the main distinguishing part here is the “google” part, and indeed many people are fairly good at taking a company’s name and guessing that the likely corresponding website has “www.” at the beginning, “.com” at the end, and some version of the company’s name in between.

The structure of a server name comes from another kind of hierarchical naming, somewhat like what we described with the slash-separated part of the path. But in a server name, the elements of the hierarchy are separated with dots instead of slashes. And making it extra interesting, the order of the hierarchy is reversed. We previously saw that a resource name narrows left-to-right: the rightmost element is the smallest, and the leftmost element is the largest. In contrast, server names narrow right-to-left: the rightmost element (in this case, “com”) is the largest, and the leftmost element (in this case, “www”) is the smallest.

So just to be completely clear about this slightly goofy and illogical situation: We refer to resources in terms of URLs. Each URL consists of three main parts (scheme, server, resource) that have completely different naming systems; two of those systems (server and resource) use hierarchical structuring while one system (scheme) doesn’t. The two hierarchical naming systems use different special characters to separate the parts of the hierarchy (“.” for servers, “/” for resources) and the hierarchies go in opposite orders. What a mess!

Finding Servers

Fortunately, naming is not as confusing in practice as it might sound from this description. The structuring of the server name reflects an ingenious system that allows many different organizations to control how those names work locally. The underlying “trick” is that each level of the hierarchy logically identifies a directory that contains the meaning of the name immediately to the left. In figure 15.3, “com” is the name of a directory service that knows the identity of “google.com”—that is, there is an entry in the “com” directory for the name “google,” and that entry potentially represents another directory to be consulted.

Figure 15.3

The directory for com.

Similarly, the directory service identified by “google.com” has an entry for the name “www” and that server has the full name “www.google.com” (see figure 15.4).

Figure 15.4

The directory for google.com.

So now we know in principle how to find the server named “www.google.com”:

  1. 1. Ask the server for “com” where to find “google.com”
  2. 2. Ask the server for “google.com” where to find “www.google.com”

The quick-witted reader will recognize this as a recursive process, as previously described in chapter 5. This recursive approach to definition and lookup allows us to make names with any number of levels separated by dots.

The directory service for naming servers is called the Domain Name System or DNS. Each “level” of the long name is actually called a domain. Since each domain has total control of the meaning of its own names, a domain is both a kind of directory and a kind of kingdom. Accordingly, the name “domain” seems quite appropriate.

DNS allows the delegation of naming control to diverse organizations, which are then in turn allowed to delegate some or all of their naming powers to other organizations. This decentralized approach to naming was defined in the early 1980s, well before the invention of the web. It has survived the enormous growth of the internet without changes to its fundamental model. DNS is perhaps insufficiently appreciated by the average user, exactly because it is pervasive and reliable.

When we finally succeed in walking our way through the directory hierarchy to find a particular server, what do we actually get as a result? We get a number—sort of like a zip code or postal code for a single server. The number uniquely identifies the server of interest, and lets us establish communication with it. Even if DNS isn’t working, you can still browse to the Google home page if you know the right number for one of its servers—you can type one of those numbers into the URL box of the browser, taking the place of the server name “www.google.com.” For example, as I am writing this chapter, I can replace “www.google.com” with the number “74.125.226.16” in a browser bar and get the exact same page. (Your results may differ, for reasons that we explain later in the book.) In chapter 20 we’ll take a closer look at how the number actually allows us to communicate with a particular server. For now, we just assume that the right number has quasi-magical powers that allow us to communicate with the corresponding server.

We explained the recursive lookup process: look in a parent directory to find a child, repeat as necessary. However, we haven’t yet explained how to find “com” to get started—we need a base case for this recursive lookup process, and it comes down to a kind of “hard-wired” system. Certain root servers are special. Their addresses are widely published and changed rarely, if ever. In addition, special efforts are made to protect the integrity of the data that they hold, since that information serves as the basis for the naming system. To find a directory service for a top-level domain like “com” or “edu” or “org,” you contact a root server. At this writing, there are thirteen different root servers around the world, each run by a different organization—some nonprofit, some government, some for-profit.

However, this arrangement is only a shared convention, not a law of nature. One of the interesting governance issues for the web is control of the root servers, which carries with it the power to determine all naming. Creating new top-level domains (rightmost names, like com) offers new opportunities for profit, since the new names in the domain can be sold to people and organizations that want those names. Placing top-level domains under control of national governments offers new opportunities for control: “com” (and, therefore, “www.google.com”) might mean something different in Iran or China from what it means in the United States, if those other countries each had their own version of “com.”

Caching

DNS is remarkably effective. It provides as many names as anyone needs, divided as finely as needed, with independent control of the generation and meaning of those names. It’s all quite elegant! However, as we’ve described it so far, this hierarchy is potentially very inefficient. We don’t really want to repeatedly consult various directory services across the internet just to figure out what some particular name means. Simply finding a server shouldn’t require lots of possibly expensive directory lookups.

Is there a way to avoid doing repeated directory lookups? Yes, by saving and reusing the results from previous lookups.

When we considered computation in chapter 4, we observed that although it’s technically correct to say that the value of (237 + 384) is (237 + 384), most people find it more useful to say that the value of (237 + 384) is 621. If we have an even longer expression, like

(26 + 89 + 9 + 20 + 49 + 38 + 83 + 22 + 10 + 3 + 77)

which has the value 436, it’s even more useful to keep around the final value rather than recomputing it every time it might be needed. In this particular case, a simple way to record that result is to write it on the page where we can read it later. However, if a similar situation arose that wasn’t an example in a book and we nevertheless had to remember the result of a computation, we’d want to write it down or otherwise save it somehow.

We can take these observations as simple examples of a general principle: rather than redo a computation, we can keep the result around and save ourselves the effort of redoing the computation. Naturally, this strategy becomes considerably more powerful as the computations become more complex and expensive, instead of this rather simple example of addition. Computer scientists refer to this strategy as caching. Correspondingly, they refer to a cache as the place where some previously computed results might be stored for reuse.

When we have a lookup problem where the answers don’t change very often, and we can easily tell when an answer is wrong, that is a great setup for caching previous answers. For example, having once done the full-scale lookup for “www.google.com” to figure out the numeric address of the server, we can then keep a local copy of the result, perhaps in our own little directory. Figure 15.5 shows the full lookup process we’d go through to find “www.google.com”:

•  The first step consults the top directory for “com.”

•  The next step consults the middle directory for “google.com.”

•  And the last step actually contacts the server identified as www.google.com.

Figure 15.5

Looking up www.google.com.

The next time we need to know what that particular name means, we can simply assume that it’s the exact same server as before. Figure 15.6 includes the same elements as before to show how we would use a cache as a shortcut on the process.

Figure 15.6

Using a cache for faster lookup.

Instead of doing all the lookups again, we can use the previous result to just go straight to www.google.com. If we’re wrong, we can tell without any harm being done. Being wrong doesn’t happen very often, because the association between a particular name and a particular server is pretty stable. We need both those properties (no harm on error, and errors are uncommon) to use caching well. If either of those properties isn’t true, then caching is probably not going to work well. A cached copy is not guaranteed to be accurate, so we can’t use caching if an occasional error would be a serious problem. And the speedup that we get from caching only happens when there’s no error, so we won’t get any benefit if errors are common.

A great feature of this caching trick is that it can be applied at every level of the naming hierarchy. So in addition to the browser potentially caching the meaning of “www.google.com,” it can also cache the meanings of “google.com” and “com.” Why is that useful? Because we expect that even when some specific server is changed, it’s likely that much of the rest of the naming hierarchy around that change is still unchanged and reusable.

So if we’re contacting our “usual” servers, we don’t generally start from the root server and walk down the whole collection of dictionaries: instead, we go straight to the cached location of the server (www.google.com). If that “educated guess” is incorrect, we next make another similar guess to contact the parent google.com directory at its last known (and cached) location. Only for an entirely new server name—no overlapping elements with previous server names—would we need to do a full-scale lookup starting from a root server.

Talking to the Server

At this point, we know how to understand the server part of the URL. That knowledge lets us find www.google.com by consulting DNS. Further, we know roughly how the browser/server conversation will go: the browser will ask the server for a particular resource. How does that conversation with the server actually take place?

The conversation between the browser and the server is broken up into packets (chapter 14), each of which crosses the network independently. The packets aren’t just any old packets—instead, both browser and server agree to communicate according to the rules of yet another protocol. That protocol is called TCP, which stands for Transmission Control Protocol, but that name doesn’t illuminate much. Just as no one refers to IBM as “International Business Machines,” no one ever calls TCP anything except TCP.

TCP hides most of the messiness of packet-based networking, and we will examine it in some more detail in chapter 19. So by using TCP, rather than working with packets, the browser can simply write some text across the network to the server, and the server will get it just the way the browser sent it; likewise, the server can write some other text over the network to the browser, and the browser will get it just the way the server sent it. (There’s no choice or negotiation involved in the use of TCP here—you can’t do web browsing any other way.)

The “http:” scheme in the URL means that the browser will establish a TCP conversation with the server using a particular numeric indicator (80). That value signals to the server that the conversation will use HTTP, which is both the specific web-browsing vocabulary used on top of TCP and the first protocol we mentioned in this chapter. The browser has to include that indicator because the server might be handling lots of different kinds of TCP conversations—not every use of TCP is web browsing!

TCP involves some setup work—back-and-forth messages—between the two parties. Once that setup is complete, the two parties (browser and server) can each send text to the other. Then the browser “speaks” a particular command from the vocabulary defined by HTTP. Specifically, the browser sends “GET /” which means “send me the resource named ‘/’.” The server responds with a relatively long message that includes information about the layout of the Google home page.

The browser then takes all of the material returned in the server’s response, and uses that data to render a visible version of it as the familiar Google home page.

Structure vs. Presentation

Even an apparently simple web page may include many different parts. The Google home page prominently includes the Google logo, a box for typing some words to guide the search, and two buttons for selecting the kind of search. (We’re choosing to ignore some of the additional features that Google sometimes adds—substitute versions of the logo including animation, or various other buttons and messages around the edge of the page.)

The information received from the server specified all of this information: the appearance of the logo, the number and placement of elements on the page. The elements of the page and their relationships are described in a somewhat elastic way—the goal is to ensure that the result still looks good and works correctly when presented on a wide variety of display sizes. Such elasticity is quite challenging: different people may well look at the exact same web page using a tiny mobile-phone screen or an enormous desktop display, and design choices that make sense for one setting may well look ridiculous or be completely unusable in the other.

In principle, web pages are organized so that there is a separation between structure and presentation. The structure consists of the aspects that are included regardless of how the page is presented—we could think of structure as the bones of the page. The presentation is how the page actually looks—the flesh on the bones, if you like. Pages are described in a language called HTML, the HyperText Markup Language. That language makes it fairly easy to produce an adequate if unimpressive result—what you might think of as a generic person with generic clothing. However, it requires a completely different level of effort to achieve something that is widely adaptable and aesthetically striking. Such high-end design usually requires considerable sophistication, experience, and testing.

Forms

Throughout all of this discussion about understanding the URL and contacting the server, we’ve focused only on displaying information that the browser is fetching from the server. Now we will briefly consider what happens when the flow of information is in the opposite direction, from user to server. For example, we can look at what happens when a user types some text into the Google search box and hits “return.” We’ll start with the simple version of how this mechanism worked at Google in the early days of the web, since that’s how many parts of the web still work; then we’ll describe a little about how Google works differently now.

The Google home page is one particular example of a more general web page structure called a form. A form describes a mixture of information to be displayed, empty fields to be filled in by the user, and buttons that can be pressed by the user. A single form may be very large and complex with many fields, and a single web page may include multiple forms. However, the “classic” Google home page that we are using as our example consists of only a single search box and two buttons. The search box is specified to accept text—whatever the user feels like typing in. In general, both text fields and buttons may have actions associated with them; and in the case of the Google home page, all of those elements—the search box and the two buttons—do indeed have actions associated with them.

How does the browser know what buttons to display, and what actions to perform? All the necessary information was included in the reply that came from the server. Part specified the look of the page, part specified the actions, and part specified the associations between parts of the page and the actions. Effectively, the server sent the browser a kind of program that the browser then runs to provide the relevant page to a user, with the resulting page having additional little programs embedded.

Whether the search is triggered by hitting “return” on the keyboard or by clicking on one of the buttons, the browser constructs a request to a server. In some ways this is just like the process that we already saw for fetching the Google home page. The main difference is that in the previous interaction we started with a fairly simple URL that was typed in by the user; in contrast, now the URL is constructed by the browser. How does the browser construct the URL? It’s following rules specified by what the server sent down to build the form on the Google home page. In this particular case, those rules involve taking the text in the search box, processing it slightly, and embedding that processed text in the path of the constructed URL. That URL is then used for another fetch like the one that brought down the information about how to present the Google home page.

Interacting with the web via a dynamically constructed collection of little programs, each of them downloaded on demand from a remote server—that sounds pretty exotic and futuristic! But that’s an accurate description of how even a simple form-based page works.

Returning now to the URL constructed by the search box, you can find the search terms in the path following a special marker. Depending on how you typed the search terms into the browser, that marker will show up as “#q=” or “?q=” The following search terms themselves are tweaked so that they can be represented within the URL rules. For example, a space between words becomes “+” since a URL can’t contain spaces. So if you type “Barack Obama” in the search box, it becomes “Barack+Obama” in the URL.

We already know something about how a URL is interpreted and a server contacted. Once the request is received by the server, the actual search takes place. As far as the browser is concerned, the server just miraculously returns one or more entities as results; we won’t attempt to explain how the search is performed, or how the database is built.

Escaping

You might well wonder, if “+” means a space, how can you represent a “+” in this URL-encoding? An actual plus sign “+” that appears in the search box is instead transformed into “%2B” – which might seem arbitrary, but is actually another way of writing the particular sequence of bits that is used to represent “+.” So if you write “3+4” in the search box, it shows up in the path as “3%2B4.”

This substitution is an example of what computer scientists call escape codes or escaping: the “%” is used as a marker that “what comes afterward is special,” and various hard-to-represent symbols or characters are translated into a different representation. There are a number of places in computer systems in which data needs to be similarly “escaped” so that unconstrained data can be transferred through some mechanism that treats particular data elements as meaningful.

There is a comical version of this problem in a classic short Monty Python sketch, “Gestures to Indicate Pauses in Televised Talk.” The announcer is trying to distinguish pauses in his speech from the end of the announcement. To that end, he introduces both a gesture to indicate a normal pause that shouldn’t end the sketch, and a separate gesture he’ll use to indicate the end of the sketch. He then struggles to remember to use the “pause” gesture every time he pauses, as well as having difficulty with demonstrating the “end” gesture without inadvertently ending the sketch prematurely. It’s handy to realize that you can readily see a practical example of this need to mark things the right way: in your browser bar when you search at Google.

Searching Searches

The Google search page is actually a little more sophisticated than what we’ve described so far. Even when we narrow our attention to just the text box, there is also a “prompting” or “completion” feature that we haven’t yet described.

The search box has another action attached, in addition to the “search” action that we have described. This additional action takes place after every single character typed by the user. Originally, this kind of action was intended for validation of what the user types. On each keystroke, the triggered action can check whether the input character is acceptable, and reject or correct it if it is not.

The Google search page ingeniously repurposes that input validation to produce quite a different effect. The per-character action takes the partial search string (including the newly typed character) and hands it to the server for a search. But the browser is asking for a different kind of search from the ones we previously described. Instead of searching the web for corresponding results, there is a search of popular Google searches. Why is that helpful? Many other people may have previously searched for something similar to what you’ve typed so far. The set of all information on the web is inconceivably huge, but there is only a much smaller set of currently popular phrases resembling what you’ve typed so far. Accordingly, it’s possible to do a useful search of popular relevant phrases much more rapidly than is possible for searching the whole web.

The top few results of the search are supplied as possible choices that are displayed adjacent to the search box. Each such choice has an associated action to perform the corresponding search. Effectively, each character typed is triggering a program. That program contacts a Google server to run another program. That server program in turn sends back snippets of text each associated with its own little program. The user experiences all this as the service making helpful suggestions. Meanwhile, the underlying mechanism is doing a surprising amount of surprisingly complicated work in response to every single keystroke. Once again we see the possibilities that arise from the sheer number of steps that modern computers can take during an interval that seems quite short in human terms.