6 | They Call Them “Satellite Anomalies” |
Space weather is working its way into the national consciousness as we see an increasing number of problems with parts of our technological infrastructure such as satellite failures and widespread electrical power brownouts and blackouts.
—National Space Weather Program, “Implementation Plan,” 1999
January 20, 1994, was a moderately active day for the Sun. There were no obvious solar flares in progress and there was no evidence for any larger than normal amounts of X rays, but a series of coronal holes had just rotated across the Sun between January 13–19. According to the National Oceanic and Atmospheric Administration’s Space Environment Center, the only sign of unrest near the Earth was the high-speed solar wind from these coronal holes, which had produced measurable geomagnetic storm conditions in their wake. NASA’s SAMPEX satellite was beginning to tell another, more ominous, story. The Sun was quiet, but there were unmistakable signs that energetic electrons were being spawned near geosynchronous orbit, and their concentrations were climbing rapidly. These particles came from the passage of a disturbance from the magnetotail region into the inner magnetic field regions around the Earth. Within minutes, the GOES-4 and GOES-5 weather satellites began to detect accumulating electrostatic charges on their outer surfaces. Unlike the discharge you feel after shuffling across a floor, there is no easy and quick way that satellites can unload the excess charges they accumulate, and so they continue to build up until the surfaces reach voltages of hundreds, or even thousands, of volts.
The Anik E1 and E2 satellites, owned by Telesat Canada, were a twin pair of GE Astro Space model 5000 satellites, weighing about seven thousand pounds and launched into space in 1991. From their orbital slots on the equator nine hundred miles southwest of Mexico City and fifteen hundred miles apart in space, they soon became the most powerful satellites in commercial use in all of North America. Virtually all of Canada’s television broadcast traffic passed through the E2 transponders at one time or another. The E2 satellite provided the business community with a variety of voice, data, and image services. Despite some technical difficulties with the deployment of the Anik E2 antenna, which dogged engineers for several months, the satellites soon became a reliable cornerstone for North American commerce and entertainment.
Canadians eagerly awaited the start of the Anik-E satellite service because major cities were few and far between across Canada, a territory bigger than the United States. With hundreds of small towns, and only a few dozen major cities with television stations, the satellites quickly became the information lifeline for many parts of Canada. Twenty-three hundred cable systems throughout Canada and nearly one hundred thousand home satellite dish owners depended on these satellites to receive their programming. Newspapers relied on these satellites to beam their newspapers to distant printing presses to serve far-flung arctic communities. Most people thought the satellites would continue working until at least 2003, but on January 20, 1994, this optimism came to an end.
As the GOES satellites began to record electric charges from the influx of energetic particles, the Intelsat-K satellite began to wobble on January 20, 1994, and experienced a short outage of service. About two hours later, the Anik satellites took their turn in dealing with these changing space conditions and did not do as well. The satellites experienced almost identical failures having to do with their momentum wheel control systems, which help to keep the satellite properly pointed. The first to go was Anik E1 at 12:40 P.M., when it began to roll end over end uncontrollably. The Canadian press was unable to deliver news to over 100 newspapers and 450 radio stations for the rest of the day but was able to use the Internet as an emergency backup. Telephone users in forty northern Canadian communities were left without service. It took over seven hours for Telesat Canada’s engineers to correct Anik E1’s pointing problems using a backup momentum wheel system.
About seventy minutes later, at 9:10 P.M., the Anik E2 satellite’s momentum wheel system failed, but its backup system also failed, so the satellite continued to spin slowly, rendering it useless. This time, 3.6 million Canadians were affected as their only source of TV signals went out of service; in an instant, television sets became useless pieces of furniture. Popular programs such as MuchMusic, TSN, and the Weather Network were knocked off the air for three hours while engineers rerouted the services to Anik E1. For many months, Telesat Canada wrestled with the enormous problem of trying to reestablish control of Anik E2. They were not about to scrap a $300 million satellite without putting up a fight. After five months of hard work, they were at last able to regain control of Anik E2 on June 21, 1994. The bad news was that instead of relying on the satellite’s disabled pointing system they would send commands up to the satellite to fire its thrusters every minute or so to keep it properly pointed. This ground intervention would have to continue until the satellite ran out of thruster fuel, shortening its lifespan by several years. The good news was that Telesat Canada became the first satellite company to actively stabilize a satellite using “active ground loop control” without using onboard satellite attitude system. In the end, it would turn out to be something of a Pyrrhic victory because on March 26, 1996, at 3:45 P.M., a critical diode on the Anik E1 solar panel shorted out, causing a permanent loss of half the satellite’s power. Investigators later concluded that this, too, was caused by an unlucky solar event.
The connection between the geomagnetic disturbance and the Anik satellite outages seemed to be entirely straightforward to the satellite owners at the time, and Telesat Canada publicly acknowledged the cause-and-effect relationship in press releases and news conferences following the outages. They also admitted that the Anik space weather disturbance that had ultimately cost their company nearly $5 million to fix was consistent with past spacecraft-affecting events they had noticed and that very similar problems had also bedeviled the Anik-B satellite fifteen years earlier. What also made this story interesting is that the Intelsat-K and the two Anik satellites are of the same satellite design. The key difference, however, is that the Intelsat Corporation specifically modifies its satellites to survive electrostatic disturbances including solar storms and cosmic rays. This allowed the Intelsat-K satellite to recover quickly following the storms that disabled the unmodified Anik satellites. Clearly, it is possible, and desirable, to “harden” satellite systems so that they are more resistant to solar storm damage. This lesson in spacecraft design is not a new one we have just learned since the Anik outage but a very old one that has been applied more or less conscientiously since the dawn of the Space Age itself when these problems were first uncovered.
Although the USSR managed to surprise the United States by orbiting Sputnik 1, our entry into the Space Age came in 1958 with the launch of the Explorer 1 satellite. The main objective of the satellite was simply to staunch the perception that we had fallen behind the USSR in a critical technological area. So the satellite—no bigger than a large beach ball—was put on the engineering fast-track and equipped with a simple experiment devised by James van Allen at the University of Iowa. Even before the first satellite entered the space environment, scientists had long suspected that there would be some interesting things for instruments to measure when they got there, among them, elusive particles called cosmic rays. What they couldn’t imagine was that billions of dollars of satellite real estate would eventually fall victim to these same cosmic bullets.
More than ten years earlier, physicists working with photographic films on mountain tops had detected a rain storm of “cosmic rays” streaming into the atmosphere, but their origins were unknown. Van Allen wanted to measure how intense this rain was before it was muffled by the Earth’s blanket of atmosphere, and perhaps even sniff out a clue about where they were coming from in the first place. His experiment was nothing more than a Geiger counter tucked inside the satellite, but no sooner was the satellite in space than the instrument began to register the clicks of incoming energetic particles. Space was indeed “radioactive.” Since then, the impact these particles have had on delicate satellite electronics has been well documented by both civilian and military scientists.
Satellites receive their operating power from large-area solar panels that have surfaces covered by solar cells. When the Sun ejects clouds of high-energy protons, these particles can literally scour the surfaces of these solar cells. Direct collisions between the high-speed protons and the atoms of silicon in the cells cause the silicon atoms to violently shift position. These shifting atoms produce crystal defects that increase the resistance of the solar cells to the currents of electricity they are producing. Solar cell efficiency steadily decreases, and so does the power produced by the solar panels. Engineers have learned to compensate for this erosion of power by making solar panels oversized. This lets the satellite start out with extra capacity to cover for this steady degradation of electrical output. But this degradation doesn’t happen smoothly over time. Like a sudden summertime hailstorm, the Sun produces unpredictable bursts of particles that do considerable damage in only a few hours.
A series of powerful solar proton events during October 19–26, 1989, for example, caused many satellites to experience severe solar panel degradation in a few days. According to Joe Allen at NOAA, the power output from the solar panels could be carefully followed for GOES satellites, and the October events collectively caused them to lose five years of operating lifetime. This incident also provides an example of how hard it can be to track down accurate information in some space weather impacts. Aviation Week and Space Technology published an article eight years later in which a report claimed that the GOES-7 satellite itself suffered a five-year, 50 percent mission lifetime loss from this event.
High-energy particles also do considerable internal damage to spacecraft. At the atomic scale, to an incoming proton or electron, the walls of a satellite look more like a porous spaghetti colander than some solid impenetrable wall of matter. High-energy protons can also collide with atoms in the walls of satellite and produce sprays of secondary energetic electrons that penetrate even deeper into the interior of the satellite. Engineers call this “internal dielectric charging.” As the batterylike charging of the satellite continues, eventually the electrical properties of some portion of the satellite breaks down and a discharge is produced. In a word, you end up with a miniature lightning bolt that causes a current to flow in some part of an electrical circuit it’s not supposed to. As anyone who has inserted new boards into their PC can tell you, just one static discharge can destroy the circuitry on a board. Energetic particles can also deliver their charges directly to the microscopic junctions in electronic circuitry and change information stored in a computer’s memory.
Microscopic current flows can flip a computer memory position from “1” to “0” or cause some components, or an entire spacecraft system, to switch on when it shouldn’t. When this happens, it is called a “single event upset,” or SEU, and like water they come in two flavors: hard and soft. A hard SEU actually does irreparable physical damage to a junction or part of a microcircuit. A soft SEU merely changes a binary value stored in a device’s memory, and this can be corrected by simply rebooting the device. Electrostatic discharges can also cause phantom command events. Engineers on the ground cannot watch the circuitry of a satellite as it undergoes an electrostatic discharge or SEU event, but they can monitor the functions of the satellite. When these change suddenly, and without any logical or human cause, they are called “satellite anomalies.” They happen a lot more often than you will ever read about in the news media. With hundreds of satellites operating for several decades, over nine thousand anomalies have been recorded by clients of NOAA’s National Geophysical Center.
Gordon Wrenn is the retired section leader of the Space and Communications Department of DRA Farnborough in England. Some years ago, he looked into a rash of unexpected changes in an unnamed geosynchronous satellite’s pointing direction. The owners of the satellite let him look at their data under the condition that he not divulge its name or who owned it. When the anomalies were compared to the radiation sensor data from the GOES-7 and METEOSAT-3 satellites, it was pretty clear that they correlated with increases in the number of energetic electrons detected by GOES-7. These insights, however, cannot be uncovered without cooperation from the satellite owners. The specific way that energetic particles cause internal dielectric charging can only be ferreted out when satellite owners provide investigators with satellite data, as Wrenn explains, “Prompt and open reporting offers the opportunity to learn from others’ mistakes. Sometimes the lesson can be fairly inexpensive; Telesat Canada were not so fortunate [with the loss of the Anik satellites].”
You can get information about satellite anomalies from government research and communication satellites because the information is, at least in principle, open to public scrutiny. The only problem is that you need to know who to talk to, or you have to be willing to comb through hundreds of technical reports, almost none of which are available on the Internet.
The first satellite in the NASA Tracking and Data Relay Satellite System (TDRSS-1) was launched in April 1983, and from that time onward the satellite has been continuously affected by soft SEUs. The satellite anomalies affected the spacecraft’s attitude control system, and, like mosquitoes on a warm day, they remain a constant problem today. The SEUs have been traced to changes in the computer’s RAM, and the most serious of these SEUs were considered mission threatening. If left uncorrected, they could lead to the satellite tumbling out of control. Ground controllers have to constantly keep watch on the satellite’s systems to make certain it keeps its antennas pointed in the right direction. This has become such an onerous task that one of the ground controllers, the late Don Vinson, once quipped, “If this [the repeated SEU’s] keeps up, TDRS will have to be equipped with a joystick.”
The problems with TDRSS-1 quickly forced NASA to redesign the next satellites in the series, TDRSS-3 and 4 (TDRSS-2 was lost in the Challenger accident), and the solution was fortunately very simple. In engineering speak, “The Fairchild static, bi-polar 93L422 RAMS were swapped for a radiation-hardened RCA CMM5114 device based on a different semiconductor technology.” Radiation hardening is a complex process of redesigning microcircuits so that they are more resistant to the high-energy particles that pass through them. The result is that the two new TDRSS satellites have recorded very infrequent SEUs, while, during the same operation period, hundreds still cause TDRSS-1 to rock and roll, keeping the satellite’s human handlers steadily employed for the foreseeable future.
Additional examples of satellites that have suffered from serious damage are harder to find because commercial satellite companies do not want it widely known what the cause of a satellite problem is. The military, on the other hand, also considers this kind of satellite vulnerability information a sensitive issue. Although military satellite impacts are inaccessible, it is possible to seek out, from news reports and a variety of trade journals, many examples of satellite problems caused by, or likely to have been caused by, solar storm events. As with all of the other problems we have seen so far, the biggest lightning rod for these events was the major solar storm during the last solar cycle in March 1989. Over eighty thousand objects are tracked by the powerful radars used by the U.S. Air Force Space Command, but, during this storm, over thirteen hundred of the objects moved from the “identified” to the “unidentified” category as increased atmospheric drag affected their orbits and temporarily converted them into unidentified objects. Later on that same year, another powerful flare between August 15–16 led to a series of geomagnetic events on August 28–29 that caused half the GEOS-6 telemetry circuits to fail immediately. Meanwhile, back on Earth, the Toronto Stock Exchange closed unexpectedly when all three of their “fault tolerant” disk drives crashed at the same time. This later incident may have been a coincidence—we just don’t know for certain.
A particularly common way for satellites to fail is for their attitude control systems to be damaged or compromised in some way. Why this happens has a lot to do with how a satellite recognizes its orientation in space. These systems contain a set of sensors to determine the direction that a satellite is pointing in space, a set of thrusters or gyros to move the satellite in three directions, and a system for “dumping” angular momentum, usually through a mechanical component called a momentum wheel. The basic operating principle for many of these attitude systems is to use some type of sensor or “star tracker” to take frequent images of the sky and compare the locations of the detected stars with an internal catalog. A computer then compares the position differences and causes the satellite to reorient itself to point in the right direction. Energetic particles can impact sensitive electronic camera elements, specifically the so-called CCD chip, and produce false stars. This causes added wear and tear on the entire pointing system as the satellite uses up fuel and the momentum wheel system is needlessly exercised. Even the Hubble Space Telescope, whose mission is actually to observe stars, sees more of these than it is supposed to, because its attitude system and CCD cameras are also under steady attack every day. By the way, it also uses the Fairchild 93LA22 RAM that was employed in TDRSS-1. During September 29, 1989, a strong proton flare caused power panel and star tracker upsets on NASA’s Magellan spacecraft en route to Venus. The storm was also detected near Earth by the GEOS-7 satellite. The burst of high-energy protons from the distant sun was the most powerful one recorded since February 1956.
Earlier generations of communications satellites that didn’t require star trackers for high-precision pointing used an even simpler position system. Because of the very large transmission beam sizes that cover entire continents, these satellites used magnetometer sensors that detected the local magnetic field of the Earth. Onboard pointing systems compared the detected field orientation against an internal table of what it ought to be if the satellite were pointing correctly. Although using the local magnetic field only gives pointing measurements that are good to a degree or so, this is often good enough for some types of satellites. According to collected reports by Joe Allen, during the March 13–14, 1989, solar storm that triggered the Quebec blackout, the accompanying geomagnetic storm caused many satellite problems. Geosynchronous satellites, which used the Earth’s magnetic field to determine their orientation, had to be manually controlled to keep them from trying to flip upside down as the orientation of the magnetic field became disturbed and changed polarity. Records show that some low-altitude, high-inclination, and polar-orbiting satellites experienced uncontrolled tumbling. Even today, the Iridium satellite network, for example, also uses magnetometers as a part of their pointing system and so are, at least in principle, potential victims of geomagnetic disturbances.
When a satellite changes its pointing direction, it can either do so by using thrusters or by pushing against an internal mass of some kind. Thrusters are quite messy and only used for gross maneuvers, so satellites use a momentum wheel to provide a countermass to push against as they are turned. A momentum wheel is a symmetric mass of material oriented so that the spin axis is exactly along the major axis of the satellite. Each time the satellite pointing direction is altered slightly, the laws of physics require that each push has to be matched by one in the opposite direction. It is this latter one that causes the momentum wheels to spin up as the satellite pushes in the opposite direction against the momentum wheel to alter its pointing direction. Eventually the rotational energy has to be unloaded or “dumped” so that the momentum wheel system doesn’t, literally, fly apart. According to Allen, during October 19–26, 1989, solar storm sequence, an unnamed thirteen-satellite geosynchronous satellite constellation reported 187 “glitches” with its attitude system. Beyond the problem of the attitude control system is the issue of general component vulnerability.
The introduction of off-the-shelf components into the design of satellites has been one of the major revolutions pointed to in recent years by satellite manufacturers, and this keeps space access costs plummeting. It is increasingly being touted as good news for consumers, because the cost-per-satellite becomes very low when items can be mass-produced rather than built one at a time. Based on its experience with the seventy-two-satellite Iridium series (now being deorbited), in 2000–2001, Motorola will begin the fourteen-month mass production of the 288 satellites for the Teledesic network in the fastest satellite construction project ever attempted. According to Chris Galvin, CEO for Motorola, their perception is that “satellites are not rocket science so much any more as much as [simply] assembly.” This attitude has come to revolutionize the way that satellite manufacturers view their products and estimate the risks of an enterprise.
But there is a downside to this exuberance and economic savings. Most of this revolution in thinking has happened during the mid-1990s while solar activity has been low between the peaks of Solar Cycle 22 and 23. The fact that energetic particles can invade poorly shielded satellites and disrupt sensitive electronics in a variety of ways is not a recently discovered phenomenon that we have to experimentally reconfirm. It has been a fact of life for satellite engineers for over forty years. Data from government research satellites, and weather satellites, convincingly show that the particulate showers from solar wind particles, cosmic rays, solar flares, energetic proton events, and CMEs can all affect spacecraft electronics in a variety of ways. Some of these are inconsequential, others can be fatal. They do not constitute a mystery that we have only encountered by actually placing expensive satellites in harm’s way. For this reason, our current situation with respect to solar storms and satellite technology is very different than when previous technologies were developed and deployed for commercial use. It typically took decades for earlier technologies to begin to show signs of sensitivity.
Even more troubling than satellite electronics is that energetic neutrons and other fission fragments produced when solar flare particles strike atoms in the Earth’s atmosphere can travel all the way to the ground. There they affect aircraft avionics, causing temporary glitches in both civilian and military aircraft. About one in ten avionics errors are “unconfirmed,” which means that no obvious hardware or software problem was ever found to have caused them. In another related incident, an engineer working for American Airlines was curious about a spate of computer glitches that occurred during sales transactions on Trans-Pacific flights. A follow-up investigation by Joe Allen at NOAA confirmed that the glitches matched the record of large magnetic storms or auroral conditions then in progress. This was an exciting result that seemed to confirm that high-flying jet airlines could be directly affected by invisible solar and geomagnetic events. Unfortunately, when the investigation tried to contact the engineer for more data, American Airlines announced that the engineer no longer worked for them. One important source of information on these particles, believe it or not, is cardiac pacemakers. Millions of these are installed in people each year, many of whom take trips on jet planes. They record any irregularities in the rate at which they trigger their pulses, and this information can be examined when their operation logs are return to ground and downloaded by doctors for study. Those glitches recorded among airline staff who wear pacemakers do correlate with solar activity levels. There is also another “down-to-earth” problem with these solar storm particles. Whenever computers crash for no apparent reason, some new studies suggest that energetic particles from solar flares may also be to blame. With more components crammed onto smaller chips, the sizes of these components has shrunk to the point where designers are now paying close attention to solar flares. The very popular American Micro Devices K-6 processor, for example, was designed using SEU modeling programs. Because these particles cannot be eliminated by shielding, they may prove the final, ultimate limit to just how small—and how fast—designers can make the next generations of computers.
Even though the tried-and-true approach to reducing radiation effects is to increase the amount of shielding in a satellite, this will not work for all types of radiation encountered in space. For example, the APEX satellite investigators concluded, “Conventional shielding is not an effective means to reduce SEUs in space systems that traverse the inner high energy proton belt.”
The reason for this is that the particles most effective in producing SEUs are the energetic protons with energies above 40 million volts. When these enter spacecraft shielding, they collide with atoms in the shielding to spawn showers of still more particles. In fact, the thicker the shielding, the more secondary particles are produced to penetrate still deeper into the satellite. Low energy particles, however, can be stopped by nothing more than a quarter-inch of aluminum shielding. For ground-based circuit designers, shielding posseses its own problems because many shielding materials contain naturally occurring radioactive isotopes that produce their own energetic particles. Even the lead in the solder used to make electrical connections poses a severe problem.
For TDRSS-1, it was too late to do anything to make the satellite less susceptable to SEUs; however, subsequent satellites in the TDRSS series were equipped with radiation-hardened “chips,” which virtually eliminated further SEUs in these satellite systems. The pace of developing space-qualified electronics is sluggish at best. Commercial computer systems now operate with 500–1000 megahertz processors and 10 gigabyte memories, but the Space Shuttle was only recently upgraded to an IBM 80386 system. The difference is that the shuttle’s “386” can withstand major bursts of radiation and still operate reliably. Intel Corporation and the Department of Defense announced in 1998 that Sandia National Laboratories will receive a license to use the $1 billion Pentium processor design to develop a custom-made radiation-hardened version for U.S. space and defense purposes. The process of developing “rad-hard” versions of current high-performance microchips is complicated because the tricks used to increase chip speed, such as thin wiring and close packing of components, often make the chip vulnerable to ionizing radiation. Larger than commercial etched wiring and thinner than commercial oxide layer deposition are the keys to making chips hardier, it seems. The reason these efforts are expended is pretty simple, though expensive. Peter Winokur, a physicist at Sandia, noted that “when a satellite fails in space, it’s hard to send a repair crew to see what broke. You need to put in parts as reliable as possible from the beginning to prevent future problems.”
Telegraph, telephone, and radio communications were invented, and brought into commercial use, before it was fully understood that geomagnetic and solar storms could produce disruptions and interference. With satellite technology, we have understood in considerable detail the kind of environment into which we are inserting them so that the resulting radiation effects can be minimized. Their implications for the reliability of satellite services, have been fully anticipated. There are no great mysteries here that beg exploration by using multimillion dollar satellites as high-tech “test particles.” Meanwhile, the design of both satellites in space and power systems here on the ground continue to be driven by considerations that have little to do with solar storms and mitigating their impacts.