DAVID SHAW

The Quantitative Edge

In offices situated on the upper floors of a midtown Manhattan skyscraper, Shaw has assembled scores of the country’s most brilliant mathematicians, physicists, and computer scientists with one purpose in mind: to combine their quantitative skills to consistently extract profits from the world’s financial markets. Employing a myriad of interrelated, complex mathematical models, the firm, D. E. Shaw, trades thousands of stocks in more than ten countries, as well as financial instruments linked to these stock markets (warrants, options, and convertible bonds). The company seeks to profit strictly from pricing discrepancies among different securities, rigorously avoiding risks associated with directional moves in the stock market or other financial markets (currencies and interest rates).

Shaw’s secretiveness regarding his firm’s trading strategies is legendary. Employees sign nondisclosure agreements, and even within the firm, knowledge about the trading methodology is on a need-to-know basis. Thus, in my interview, I knew better than to even attempt to ask Shaw explicit questions about his company’s trading approach. Still, I tried what I thought were some less sensitive questions:

Even these circumspect questions were met with a polite refusal to answer. Although he did not use these exact words, the gist of Shaw’s responses to these various queries could be succinctly stated as: “I prefer not to answer on the grounds that it might provide some remote hint that my competitors could find useful.”

Shaw’s flagship trading program has been consistently profitable since it was launched in 1989. During its eleven-year life span, the program has generated a 22 percent average annual compounded return net of all fees while keeping risks under tight control. During this entire period, the program’s worst decline from an equity peak to a month-end low was a relatively moderate 11 percent—and even this loss was fully recovered in just over four months.

How has D. E. Shaw managed to extract consistent profits from the market for over a decade in both bullish as well as bearish periods? Clearly, Shaw is not talking—or at least not about the specifics of his company’s trading strategies. Nevertheless, based on what Shaw does acknowledge and reading between the lines, it may be possible to sketch a very rough description of his company’s trading methodology. The following explanation, which admittedly incorporates a good deal of guesswork, is intended to provide the reader with a flavor of Shaw’s trading approach.

We begin our overview with classic arbitrage. Although Shaw doesn’t use classic arbitrage, it provides a conceptual starting point. Classic arbitrage refers to the risk-free trade of simultaneously buying and selling the same security (or commodity) at different prices, therein locking in a risk-free profit. An example of classic arbitrage would be buying gold in New York at $290 an ounce and simultaneously selling the same quantity in London at $291. In our age of computerization and near instantaneous communication, classic arbitrage opportunities are virtually nonexistent.

Statistical arbitrage expands the classic arbitrage concept of imultaneously buying and selling identical financial instruments for a locked-in profit to encompass buying and selling closely related financial instruments for a probable profit. In statistical arbitrage, each individual trade is no longer a sure thing, but the odds imply an edge. The trader engaged in statistical arbitrage will lose on a significant percentage of trades but will be profitable over the long run, assuming trade probabilities and transaction costs have been accurately estimated. An appropriate analogy would be roulette (viewed from the casino’s perspective): The casino’s odds of winning on any particular spin of the wheel are only modestly better than fifty-fifty, but its edge and the laws of probability will assure that it wins over the long run.

There are many different types of statistical arbitrage. We will focus on one example: pairs trading. In addition to providing an easy-to grasp illustration, pairs trading has the advantage of reportedly being one of the prime strategies used by the Morgan Stanley trading group, for which Shaw worked before he left to form his own firm.

Pairs trading involves a two-step process. First, past data are used to define pairs of stocks that tend to move together. Second, each of these pairs is monitored for performance divergences. Whenever there is a statistically meaningful performance divergence between two stocks in a defined pair, the stronger of the pair is sold and the weaker is bought. The basic assumption is that the performance of these closely related stocks will tend to converge. Insofar as this theory is correct, a pairs trading approach will provide an edge and profitability over the long run, even though there is a substantial chance that any individual trade will lose money.

An excellent description of pairs trading and the testing of a specific strategy was contained in a 1999 research paper written by a group of Yale School of Management professors.* Using data for 1963–97, they found that the specific pairs trading strategy they tested yielded statistically significant profits with relatively low volatility. In fact, for the twenty-five-year period as a whole, the pairs trading strategy had a higher return and much lower risk (volatility) than the S&P 500. The pairs trading strategy, however, showed signs of major deterioration in more recent years, with near-zero returns during the last four years of the survey period (1994–97). A reasonable hypothesis is that the increased use of pairs-based strategies by various trading firms (possibly including Shaw’s) drove down the profit opportunity of this tactic until it was virtually eliminated.

What does Shaw’s trading approach have to do with pairs trading? Similar to pairs trading, Shaw’s strategies are probably also based on a structure of identifying securities that are underpriced relative to other securities. However, that is where the similarity ends. A partial list of the elements of complexity that differentiate Shaw’s trading methodology from a simple statistical arbitrage strategy, such as pairs trading, include some, and possibly all, of the following:

  • Trading signals are based on over twenty different predictive techniques, rather than a single method.
  • Each of these methodologies is probably far more sophisticated than pairs trading. Even if performance divergence between correlated securities is the core of one of these strategies, as it is for pairs trading, the mathematical structure would more likely be one that simultaneously analyzes the interrelationship of large numbers of securities, rather than one that analyzes two stocks at a time.
  • Strategies incorporate global equity markets, not just U.S. stocks.
  • Strategies incorporate equity related instruments—warrants, options, and convertible bonds—in addition to stocks.
  • In order to balance the portfolio so that it is relatively unaffected by the trend of the general market, position sizes are probably adjusted to account for factors such as the varying volatility of different securities and the correlations among stocks in the portfolio.
  • The portfolio is balanced not only to remove the influence of price moves in the broad stock market, but also to mitigate the influence of currency price swings and interest rate moves.
  • Entry and exit strategies are employed to minimize transaction costs.
  • All of these strategies and models are monitored simultaneously in real time. A change in any single element can impact any or all of the other elements. As but one example, a signal by one predictive technique to buy a set of securities and sell another set of securities requires the entire portfolio to be rebalanced.
  • The trading model is dynamic—that is, it changes over time to adjust for changing market conditions, which dictate dropping or revising some predictive techniques and introducing new ones.

I have no idea—and for that matter will never know—how close the foregoing description is to reality. I think, however, that it is probably valid as far as providing a sense of the type of trading done at D. E. Shaw.

Shaw’s entrepreneurial bent emerged at an early age. When he was twelve, he raised a hundred dollars from his friends to make a horror movie. Since he grew up in the L.A. area, he was able to get other kids’ parents to provide free help with tasks such as special effects and editing. The idea was to show the movie to other kids in the neighborhood for a 50-cent admission charge. But the plan went awry when the processing lab lost one of the rolls of film. When he was in high school, he formed a company that manufactured and sold psychedelic ties. He bought three sewing machines and hired high school students to manufacture the ties. The venture failed because he hadn’t given much thought to distribution, and going from store to store proved to be an inefficient way to market the ties.

His first serious business venture, however, was a success. While he was at graduate school at Stanford, he took two years off to start a computer company that developed compilers [computer code that translates programs written in user languages into machine language instructions]. Although this venture was very profitable, Shaw’s graduate school adviser convinced him that it was not realistic for him to earn his Ph.D. part-time while running a company. Shaw sold the company and completed his Ph.D. work at Stanford. He never considered the alternative of staying with his entrepreneurial success and abandoning his immediate goal of getting a Ph.D. “Finishing graduate school was extremely important to me at the time,” he says. “To be taken seriously in the computer research community, you pretty much had to be a faculty member at a top university or a Ph.D.-level scientist at a leading research lab.”

Shaw’s doctoral dissertation, “Knowledge Based Retrieval on a Relational Database Machine,” provided the theoretical basis for building massively parallel computers. One of the pivotal theorems in Shaw’s dissertation proved that, for an important class of problems, the theoretical advantage of a multiple processor computer over a single processor computer would increase in proportion to the magnitude of the problem. The implications of this theorem for computer architecture were momentous: It demonstrated the inevitability of parallel processor design vis-à-vis single processor design as the approach for achieving major advances in supercomputer technology.

Shaw has had enough accomplishments to fulfill at least a half dozen extraordinarily successful careers. In addition to the core trading business, Shaw’s firm has also incubated and spun off a number of other companies. Perhaps the best-known of these is Juno Online Services, the world’s second-largest provider of dial-up Internet services (after America Online). Juno was launched as a public company in May 1999 and is traded on Nasdaq (symbol: JWEB). D. E. Shaw also developed DESoFT, a financial technology company, which was sold to Merrill Lynch, an acquisition that was pivotal to the brokerage firm’s rollout of an on-line trading service. FarSight, an on-line brokerage firm, and D. E. Shaw Financial Products, a market-making operation, were other businesses developed at D. E. Shaw and subsequently sold.

In addition to spawning a slew of successful companies, D. E. Shaw also has provided venture capital funding to Schrödinger Inc. (for which Shaw is the chairman of the board of directors) and Molecular Simulations Inc., two firms that are leaders in the development of computational chemistry software. These investments reflect Shaw’s strong belief that the design of new drugs, as well as new materials, will move increasingly from the laboratory to the computer. Shaw predicts that developments in computer hardware and software will make possible a dramatic acceleration in the timetable for developing new drugs, and he wants to play a role in turning this vision into reality.

By this time, you may be wondering how this man finds time to sleep. Well, the paradox deepens, because in addition to all these ventures, Shaw has somehow found time to pursue his political interests by serving on President Clinton’s Committee of Advisors on Science and Technology and chairing the Panel on Educational Technology.

The reception area at D. E. Shaw—a sparsely furnished, thirty-one-foot cubic space, with diverse rectangular shapes cut out of the walls and backlit by tinted sunlight reflected off of hidden color surfaces—looks very much like a giant exhibit at a modern art museum. This bold, spartan, and futuristic architectural design is, no doubt, intended to project the firm’s technological identity.

The interview was conducted in David Shaw’s office, a spacious, high-ceilinged room with two adjacent walls of windows opening to an expansive view to the south and west of midtown Manhattan. Shaw must be fond of cacti, which lined the windowsills and included a tree-size plant in the corner of the room. A large, irregular-polygon-shaped, brushed aluminum table, which served as a desk on one end and a conference area on the other, dominated the center of the room. We sat directly across from each other at the conference end.


You began your career designing supercomputers. Can you tell me about that experience?

From the time I was in college, I was fascinated by the question of what human thought was—what made it different from a computer. When I was a graduate student at Stanford, I started thinking about whether you could design a machine that was more like the brain, which has huge numbers of very slow processors—the neurons—working in parallel instead of a single very fast processor.

Were there any other people working to develop parallel supercomputers at that time?

Although there were already a substantial number of outstanding researchers working on parallel computation before I got started, most of them were looking at ways to connect, say, eight or sixteen processors. I was intrigued with the idea of how you could build a parallel computer with millions of processors, each next to a small chunk of memory. There was a trade-off, however. Although there were a lot more processors, they had to be much smaller and cheaper. Still, for certain types of problems, theoretically, you could get speeds that were a thousand times faster than the fastest supercomputer. To be fair, there were a few other researchers who were interested in these sorts of “fine-grained” parallel machines at the time—for example, certain scientists working in the field of computer vision—but it was definitely not the dominant theme within the field.

You said that you were trying to design a computer that worked more like the brain. Could you elaborate?

At the time, one of the main constraints on computer speed was a limitation often referred to as the “von Neumann bottleneck.” The traditional von Neumann machine, named after John von Neumann, has a single central processing unit (CPU) connected to a single memory unit. Originally, the two were well matched in speed and size. Over time, however, as processors became faster and memories got larger, the connection between the two—the time it takes for the CPU to get things out of memory, perform the computations, and place the results back into memory—became more and more of a bottleneck.

This type of bottleneck does not exist in the brain because memory storage goes on in millions of different units that are connected to each other through an enormous number of synapses. Although we understand it imperfectly, we do know that whatever computation is going on occurs in close proximity to the memory. In essence, the thinking and the remembering seem to be much more extensively intermingled than is the case in a traditional von Neumann machine. The basic idea that drove my research was that if you could build a computer that had a separate processor for each tiny chunk of memory, you might be able to get around the von Neumann bottleneck.

I assume that the necessary technology did not yet exist at that time.

It was just beginning to exist. I completed my Ph.D. in 1980. By the time I joined the faculty at Columbia University, it was possible to put multiple processors, but very small and simple ones, on a single chip. Our research project was the first one to build a chip containing a number of real, multibit computers. At the time, we were able to place eight 8-bit processors on a single chip. Nowadays, you could probably put 512 or 1,024 similar processors on a chip.

Cray was already building supercomputers at the time. How did your work differ from his?

Seymour Cray was probably the greatest single-processor supercomputer designer who ever lived. He was famous for pushing the technological envelope. With each new machine he built, he would use new types of semiconductors, cooling apparatus, and wiring schemes that had never been used before in an actual computer. He was also a first-rate computer architect, but a substantial part of his edge came from a combination of extraordinary engineering skills and sheer technological audacity. He had a lot more expertise in high-speed technology, whereas my own focus was more on the architecture—designing a fundamentally different type of computer.

You mentioned earlier that your involvement in computer design had its origins in your fascination with human thought. Do you believe it’s theoretically possible for computers to eventually think?

From a theoretical perspective, I see no intrinsic reason why they couldn’t.

So Hal in 2001 is not pure science fiction.

It’s hard to know for sure, but I personally see no compelling reason to believe that this couldn’t happen at some point. But even if it does prove feasible to build truly intelligent machines, I strongly suspect that this won’t happen for a very long time.

But you believe it’s theoretically possible in the sense that a computer could have a sense of self?

It’s not entirely clear to me what it would mean for a computer to have a sense of self, or for that matter, exactly what we mean when we say that about a human being. But I don’t see any intrinsic reason why cognition should be possible only in hydrocarbon-based systems like ourselves. There’s certainly a lot we don’t understand about how humans think, but at some level, we can be viewed as a very interesting collection of highly organized, interacting molecules. I haven’t yet seen any compelling evidence to suggest that the product of human evolution represents the only possible way these molecules can be organized in order to produce a phenomenon like thought.

Did you ever get to the point of applying your theoretical concepts to building an actual working model of a supercomputer?

Yes, at least on a small scale. After I finished my Ph.D., I was appointed to the faculty of the department of computer science at Columbia University. I was fortunate enough to receive a multi-million-dollar research contract from ARPA [the Advanced Research Projects Agency of the U.S. Department of Defense, which is best known for building the ARPAnet, the precursor of the Internet]. This funding allowed me to organize a team of thirty-five people to design customized integrated circuits and build a working prototype of this sort of massively parallel machine. It was a fairly small version, but it did allow us to test out our ideas and collect the data we needed to calculate the theoretically achievable speed of a full-scale supercomputer based on the same architectural principles.

Was any thought given to who would have ownership rights if your efforts to build a supercomputer were successful?

Not initially. Once we built a successful prototype, though, it became clear that it would take another $10 to $20 million to build a full-scale supercomputer, which was more than the government was realistically likely to provide in the form of basic research funding. At that point, we did start looking around for venture capital to form a company. Our motivation was not just to make money, but also to take our project to the next step from a scientific viewpoint.

At the time, had anyone else manufactured a supercomputer using parallel processor architecture?

A number of people had built multiprocessor machines incorporating a relatively small number of processors, but at the time we launched our research project, nobody had yet built a massively parallel supercomputer of the type we were proposing.

Were you able to raise any funding?

No, at least not after a couple months of trying, after which point my career took an unexpected turn. If it hadn’t, I don’t know for sure whether we would have ultimately found someone willing to risk a few tens of millions of dollars on what was admittedly a fairly risky business plan. But based on the early reactions we got from the venture capital community, I suspect we probably wouldn’t have. What happened, though, was that after word got out that I was exploring options in the private sector, I received a call from an executive search firm about the possibility of heading up a really interesting group at Morgan Stanley. At that point, I’d become fairly pessimistic about our prospects for raising all the money we’d need to start a serious supercomputer company. So when Morgan Stanley made what seemed to me to be a truly extraordinary offer, I made the leap to Wall Street.

Up to that point, had you given any thought to a career in the financial markets?

None whatsoever.

I had read that your stepfather was a financial economist who first introduced you to the efficient market hypothesis.* Did that bias you as to the feasibility of developing strategies that could beat the market? Also, given your own lengthy track record, does your stepfather still believe in the efficient market hypothesis?

Although it’s true that my stepfather was the first one to expose me to the idea that most, if not all, publicly available information about a given company is already reflected in its current market price, I’m not sure that he ever believed it was impossible to beat the market. The things I learned from him probably led me to be more skeptical than most people about the existence of a “free lunch” in the stock market, but he never claimed that the absence of evidence refuting the efficient market hypothesis proved that the markets are, in fact, efficient.

Actually, there is really no way to prove that is the case. All you can ever demonstrate is that the specific patterns being tested do not exist. You can never prove that there aren’t any patterns that could beat the market.

That’s exactly right. All that being said, I grew up with the idea that, if not impossible, it was certainly extremely difficult to beat the market. And even now, I find it remarkable how efficient the markets actually are. It would be nice if all you had to do in order to earn abnormally large returns was to identify some sort of standard pattern in the historical prices of a given stock. But most of the claims that are made by so-called technical analysts, involving constructs like support and resistance levels and head-and-shoulders patterns, have absolutely no grounding in methodologically sound empirical research.

But isn’t it possible that many of these patterns can’t be rigorously tested because they can’t be defined objectively? For example, you might define a head-and-shoulders pattern one way while I might define it quite differently. In fact, for many patterns, theoretically, there could be an infinite number of possible definitions.

Yes, that’s an excellent point. But the inability to precisely explicate the hypothesis being tested is one of the signposts of a pseudo-science. Even for those patterns where it’s been possible to come up with a reasonable consensus definition for the sorts of patterns traditionally described by people who refer to themselves as technical analysts, researchers have generally not found these patterns to have any predictive value. The interesting thing is that even some of the most highly respected Wall Street firms employ at least a few of these “prescientific” technical analysts, despite the fact that there’s little evidence they’re doing anything more useful than astrology.

But wait a minute. I’ve interviewed quite a number of traders who are purely technically oriented and have achieved return-to-risk results that were well beyond the realm of chance.

I think it depends on your definition of technical analysis. Historically, most of the people who have used that term have been members of the largely unscientific head-and-shoulders-support-and-resistance camp. These days, the people who do serious, scholarly work in the field generally refer to themselves as quantitative analysts, and some of them have indeed discovered real anomalies in the marketplace. The problem, of course, is that as soon as these anomalies are published, they tend to disappear because people exploit them. Andrew Lo at MIT is one of the foremost academic experts in the field. He is responsible for identifying some of these historical inefficiencies and publishing the results. If you talk to him about it, he will probably tell you two things: first, that they tend to go away over time; second, that he suspects that the elimination of these market anomalies can be attributed at least in part to firms like ours.

What is an example of a market anomaly that existed but now no longer works because it was publicized?

We don’t like to divulge that type of information. In our business, it’s as important to know what doesn’t work as what does. For that reason, once we’ve gone to the considerable expense that’s often involved in determining that an anomaly described in the open literature no longer exists, the last thing we want to do is to enable one of our competitors to take advantage of this information for free by drawing attention to the fact that the published results no longer hold and the approach in question thus represents a dead end.

Are the people who publish studies of market inefficiencies in the financial and economic journals strictly academics or are some of them involved in trading the markets?

Some of the researchers who actually trade the markets publish certain aspects of their work, especially in periodicals like the Journal of Portfolio Management, but overall, there’s a tendency for academics to be more open about their results than practitioners.

Why would anyone who trades the markets publish something that works?

That’s a very good question. For various reasons, the vast majority of the high-quality work that appears in the open literature can’t be used in practice to actually beat the market. Conversely, the vast majority of the research that really does work will probably never be published. But there are a few successful quantitative traders who from time to time publish useful information, even when it may not be in their own self-interest to do so. My favorite example is Ed Thorpe, who was a real pioneer in the field. He was doing this stuff well before almost anyone else. Ed has been remarkably open about some of the money-making strategies he’s discovered over the years, both within and outside of the field of finance. After he figured out how to beat the casinos at blackjack, he published Beat the Dealer. Then when he figured out how to beat the market, he published Beat the Market, which explained with his usual professorial clarity exactly how to take advantage of certain demonstrable market inefficiencies that existed at the time. Of course, the publication of his book helped to eliminate those very inefficiencies.

In the case of blackjack, does eliminating the inefficiencies mean that the casinos went to the use of multiple decks?

I’m not an expert on blackjack, but it’s my understanding that the casinos not only adopted specific game-related countermeasures of this sort, but they also became more aware of “card counters” and became more effective at expelling them from the casinos.

I know that classic arbitrage opportunities are long gone. Did such sitting-duck trades, however, exist when you first started?

Even then, those sorts of true arbitrage opportunities were few and far between. Every once in a while, we were able to engage in a small set of transactions in closely related instruments that, taken together, locked in a risk-free or nearly risk-free profit. Occasionally, we’d even find it possible to execute each component of a given arbitrage trade with a different department of the same major financial institution—something that would have been impossible if the institution had been using technology to effectively manage all of its positions on an integrated firmwide basis. But those sorts of opportunities were very rare even in those days, and now you basically don’t see them at all.

Have the tremendous advances in computer technology, which greatly facilitate searching for market inefficiencies that provide a probabilistic edge, caused some previous inefficiencies to disappear and made new ones harder to find?

The game is largely over for most of the “easy” effects. Maybe someday, someone will discover a simple effect that has eluded all of us, but it’s been our experience that the most obvious and mathematically straightforward ideas you might think of have largely disappeared as potential trading opportunities. What you are left with is a number of relatively small inefficiencies that are often fairly complex and which you’re not likely to find by using a standard mathematical software package or the conventional analytical techniques you might learn in graduate school. Even if you were somehow able to find one of the remaining inefficiencies without going through an extremely expensive, long-term research effort of the sort we’ve conducted over the past eleven years, you’d probably find that one such inefficiency wouldn’t be enough to cover your transaction costs.

As a result, the current barriers to entry in this field are very high. A firm like ours that has identified a couple dozen market inefficiencies in a given set of financial instruments may be able to make money even in the presence of transaction costs. In contrast, a new entrant into the field who has identified only one or two market inefficiencies would typically have a much harder time doing so.

What gives you that edge?

It’s a subtle effect. A single inefficiency may not be sufficient to overcome transaction costs. When multiple inefficiencies happen to coincide, however, they may provide an opportunity to trade with a statistically expected profit that exceeds the associated transaction costs. Other things being equal, the more inefficiencies you can identify, the more trading opportunities you’re likely to have.


How could the use of multiple strategies, none of which independently yields a profit, be profitable? As a simple illustration, imagine that there are two strategies, each of which has an expected gain of $100 and a transaction cost of $110. Neither of these strategies could be applied profitably on its own. Further assume that the subset of trades in which both strategies provide signals in the same direction has an average profit of $180 and the same $110 transaction cost. Trading the subset could be highly profitable, even though each individual strategy is ineffective by itself. Of course, for Shaw’s company, which trades scores of strategies in many related markets, the effect of strategy interdependencies is tremendously more complex.


As the field matures, you need to be aware of more and more inefficiencies to identify trades, and it becomes increasingly harder for new entrants. When we started trading eleven years ago, you could have identified one or two inefficiencies and still beat transaction costs. That meant you could do a limited amount of research and begin trading profitably, which gave you a way to fund future research. Nowadays, things are a lot tougher. If we hadn’t gotten started when we did, I think it would have been prohibitively expensive for us to get where we are today.

Do you use only price data in your model, or do you also employ fundamental data?

It’s definitely not just price data. We look at balance sheets, income statements, volume information, and almost any other sort of data we can get our hands on in digital form. I can’t say much about the sorts of variables we find most useful in practice, but I can say that we use an extraordinary amount of data, and spend a lot of money not just acquiring it but also putting it into a form in which it’s useful to us.

Would it be fair to summarize the philosophy of your firm as follows? Markets can be predicted only to a very limited extent, and any single strategy cannot provide an attractive return-to-risk ratio. If you combine enough strategies, however, you can create a trading model that has a meaningful edge.

That’s a really good description. The one thing that I would add is that we try to hedge as many systematic risk factors as possible.

I assume you mean that you balance all long positions with correlated short positions, thereby removing directional moves in the market as a risk factor.

Hedging against overall market moves within the various markets we trade is one important element of our approach to risk management, but there are also a number of other risk factors with respect to which we try to control our exposure whenever we’re not specifically betting on them. For example, if you invest in IBM, you’re placing an implicit bet not only on the direction of the stock market as a whole and on the performance of the computer industry relative to the overall stock market, but also on a number of other risk factors.

Such as?

Examples would include the overall level of activity within the economy, any unhedged exchange rate exposure attributable to IBM’s export activities, the net effective interest rate exposure associated with the firm’s assets, liabilities, and commercial activities, and a number of other mathematically derived risk factors that would be more difficult to describe in intuitively meaningful terms. Although it’s neither possible nor cost-effective to hedge all forms of risk, we try to minimize our net exposure to those sources of risk that we aren’t able to predict while maintaining our exposure to those variables for which we do have some predictive ability, at least on a statistical basis.

Some of the strategies you were using in your early years are now completely obsolete. Could you talk about one of these just to provide an illustration of the type of market inefficiency that at least at one time offered a trading opportunity.

In general, I try not to say much about historical inefficiencies that have disappeared from the markets, since even that type of information could help competitors decide how to more effectively allocate scarce research resources, allowing them a “free ride” on our own negative findings, which would give them an unfair competitive advantage. One example I can give you, though, is undervalued options [options trading at prices below the levels implied by theoretical models]. Nowadays, if you find an option that appears to be mispriced, there is usually a reason. Years ago, that wasn’t necessarily the case.

When you find an apparent anomaly or pattern in the historical data, how do you know it represents something real as opposed to a chance occurrence?

The more variables you have, the greater the number of statistical artifacts that you’re likely to find, and the more difficult it will generally be to tell whether a pattern you uncover actually has any predictive value. We take great care to avoid the methodological pitfalls associated with “overfitting the data.”

Although we use a number of different mathematical techniques to establish the robustness and predictive value of our strategies, one of our most powerful tools is the straightforward application of the scientific method. Rather than blindly searching through the data for patterns—an approach whose methodological dangers are widely appreciated within, for example, the natural science and medical research communities—we typically start by formulating a hypothesis based on some sort of structural theory or qualitative understanding of the market, and then test that hypothesis to see whether it is supported by the data.

Unfortunately, the most common outcome is that the actual data fail to provide evidence that would allow us to reject the “null hypothesis” of market efficiency. Every once in a while, though, we do find a new market anomaly that passes all our tests, and which we wind up incorporating in an actual trading strategy.

I heard that your firm ran into major problems last year [1998], but when I look at your performance numbers, I see that your worst equity decline ever was only 11 percent—and even that loss was recovered in only a few months. I don’t understand how there could have been much of a problem. What happened?

The performance results you’re referring to are for our equity and equity-linked trading strategies, which have formed the core of our proprietary trading activities since our start over eleven years ago. For a few years, though, we also traded a fixed income strategy. That strategy was qualitatively different from the equity-related strategies we’d historically employed and exposed us to fundamentally different sorts of risks. Although we initially made a lot of money on our fixed income trading, we experienced significant losses during the global liquidity crisis in late 1998, as was the case for most fixed income arbitrage traders during that period. While our losses were much smaller, in both percentage and absolute dollar terms, than those suffered by, for example, Long Term Capital Management, they were significant enough that we’re no longer engaged in this sort of trading at all.


LTCM—a hedge fund headed by renowned former Salomon bond trader John Meriwether and whose principals included economics Nobel laureates Robert Merton and Myron Scholes—was on the brink of extinction during the second half of 1998. After registering an average annual gain of 34 percent in its first three years and expanding its assets under management to near $5 billion, LTCM lost a staggering 44 percent (roughly $2 billion) in August 1998 alone. These losses were due to a variety of factors, but their magnitude was primarily attributable to excessive leverage: the firm used borrowing to leverage its holdings by an estimated factor of over 40 to 1. The combination of large losses and large debt would have resulted in LTCM’s collapse. The firm, however, was saved by a Federal Reserve coordinated $3.5 billion bailout (financed by private financial institutions, not government money).


With all the ventures you have going, do you manage to take any time off?

I just took a week off—the first one in a long time.

So you don’t take much vacation?

Not much. When I take a vacation, I find I need a few hours of work each day just to keep myself sane.

You have a reputation for recruiting brilliant Ph.D.s in math and sciences. Do you hire people just for their raw intellectual capability, even if there is no specific job slot to fill?

Compared with most organizations, we tend to hire more on the basis of raw ability and less on the basis of experience. If we run across someone truly gifted, we try to make them an offer, even if we don’t have an immediate position in mind for that person. The most famous example is probably Jeff Bezos. One of my partners approached me and said, “I’ve just interviewed this terrific candidate named Jeff Bezos. We don’t really have a slot for him, but I think he’s going to make someone a lot of money someday, and I think you should at least spend some time with him.” I met with Jeff and was really impressed by his intellect, creativity, and entrepreneurial instincts. I told my partner that he was right and that even though we didn’t have a position for him, we should hire him anyway and figure something out.

Did Bezos leave your firm to start Amazon?

Yes. Jeff did a number of things during the course of his tenure at D.E. Shaw, but his last assignment was to work with me on the formulation of ideas for various technology-related new ventures. One of those ideas was to create what amounted to a universal electronic bookstore. When we discovered that there was an electronic catalog with millions of titles that could be ordered through Ingram’s [a major book distributor], Jeff and I did a few back-of-the-envelope calculations and realized that it ought to be possible to start such a venture without a prohibitively large initial investment. Although I don’t think either of us had any idea at the time how successful such a business could be, we both thought it had possibilities. One day, before things had progressed much further, Jeff asked to speak with me. We took a walk through Central Park, during which he told me that he’d “gotten the entrepreneurial bug” and asked how I’d feel about it if he decided he wanted to pursue this idea on his own.

What was your reaction?

I told him I’d be genuinely sorry to lose him, and made sure he knew how highly I thought of his work at D. E. Shaw, and how promising I thought his prospects were within the firm. But I also told him that, having made a similar decision myself at one point, I’d understand completely if he decided the time had come to strike out on his own and would not try to talk him out of it. I assured him that given the relatively short period of time we’d been talking about the electronic bookstore concept, I’d have no objections whatsoever if he decided that he wanted to pursue this idea on his own. I told him that we might or might not decide to compete with him at some point, and he said that seemed perfectly fair to him.

Jeff’s departure was completely amicable, and when he finished the alpha version of the first Amazon system, he invited me and others at D. E. Shaw to test it. It wasn’t until I used this alpha version to order my first book that I realized how powerful this concept could really be. Although we’d talked about the idea of an electronic bookstore while Jeff was still at D. E. Shaw, it’s the things Jeff did since leaving that made Amazon what it is today.


Shaw’s trading approach, which requires highly complex mathematical models, vast computer power, constant monitoring of worldwide markets by a staff of traders, and near instantaneous, extreme low-cost trade executions, is clearly out of the reach of the ordinary investor. One concept that came up in this interview, however, that could have applicability to the individual investor is the idea that market patterns (“inefficiencies” in Shaw’s terminology) that are not profitable on their own might still provide the basis for a profitable strategy when combined with other patterns. Although Shaw disdains chart patterns and traditional technical indicators, an analogous idea would apply: It is theoretically possible that a combination of patterns (or indicators) could yield a useful trading model, even if the individual elements are worthless when used alone.

This synergistic effect would apply to fundamental inputs as well. For example, a researcher might test ten different fundamental factors and find that none are worthwhile as price indicators. Does this imply that these fundamental inputs should be dismissed as useless? Absolutely not. Even though no single factor provides a meaningful predictor, it is entirely possible that some combination of these inputs could yield a useful price indicator.

Another important principle that came up in this interview concerns the appropriate methodology in testing trading ideas. A trader trying to develop a systematic approach, or any approach that incorporates computer patterns as signals, should caution against data mining—letting the computer cycle through the data, testing thousands or millions of input combinations in search of profitable patterns. Although the expense of computer time is usually no longer an issue, such computational profligacy has a more critical cost: it will tend to generate trading models (systems) that look great, but have no predictive power—a combination that could lead to large trading losses.

Why? Because patterns can be found even in random data. For example, if you flipped one million coins ten times apiece, on average, about 977 of those coins would land on heads all ten times. Obviously, it would be foolish to assume that these coins are more likely to land on heads in the future. But this type of naive reasoning is precisely what some system developers do when they test huge numbers of input combinations on price data and then trade the combination that is most profitable. If you test enough variations of any trading system, some of them will be profitable by chance—just as some coins will land on heads on every toss if you flip enough coins. Shaw avoids this problem of data mining by requiring that a theoretical hypothesis precede each computer test and by using rigorous statistical measures to evaluate the significance of the results.


Update on David Shaw

The D. E. Shaw group’s equity and equity-linked strategies have continued to roar ahead in recent years, with performance actually improving despite a significant increase in assets managed in these strategies over this period. The strategies were up 58 percent in 2000, 23 percent in 2001, and an estimated 22 percent during the first nine months of 2002 (in net terms). As a result, the average annual compounded return of these strategies during their nearly fourteen-year history has now risen to over 24 percent, applying current fees. The lifetime Sharpe ratio (a return/risk measure) is now about 2.00, an extraordinarily high figure for such a long track record.

 

Your performance alone during the past 3½ years would have led to a tripling of assets under management. How much are you currently managing in your equity and equity-linked strategies? Has the increase in assets led to capacity problems?

We’re currently managing around $4.3 billion, of which about $2.9 billion is in our equity and equity-linked strategies. At this point, the demand for our investment management services is sufficiently strong that we could easily raise more capital, but it’s important to us to avoid accepting more money than we believe we can invest effectively. Although the capacity of these strategies has increased over the past few years as a result of new research results and certain market-related factors, the amount we’re managing is still limited by capacity rather than the availability of capital.

Since you are often managing close to your estimated capacity in some strategies, what happens when your own positive performance causes assets under management to grow beyond this perceived capacity level?

We return profits to investors to an extent sufficient to bring our assets under management back to the desired level.

Have there been any significant changes to your methodology since we first spoke?

The basic methodology remains unchanged for the bulk of our strategies. We have, however, added several newly researched market effects to the couple of dozen we were already trading. We’ve also launched a new strategy focusing on the distressed securities markets.

Do the market effects or inefficiencies you trade have only limited life spans?

It depends. The market anomalies that are relatively easy to spot and exploit tend not to last. However, more subtle inefficiencies that require complex quantitative techniques to identify and extract tend to persist for a longer period of time. These are the types of inefficiencies we tend to focus on, rather than the much simpler effects that are not likely to last. Over the years, we’ve had to retire only a few effects out of the many that we trade.

Any thoughts about the corporate and accounting scandals we’ve seen recently?

I believe the regulatory and legislative scrutiny that the corporate world is experiencing right now is very healthy. A number of CEOs and CFOs were playing close to the edge—and in some cases, as we’re now seeing, over the edge—in “managing” their earnings and executing dubious complex financial maneuvers to obscure the true health of their companies. This type of activity undermines the effective functioning of the global capital markets and should be of serious concern to a nation that has historically been a leader in accounting transparency. It’s a very positive development that steps are being taken to ensure that investors and analysts have access to accurate, reliable information about the companies in which they invest.