4 Extended Reality Gets Real

XR Is More Than Technology

Convincing the mind and body that a virtual experience is real requires a complex mix of digital technologies. What’s more, all the various software and hardware components must work together seamlessly—and continuously. Any glitch, gap, or breakdown will produce an experience that ranges somewhere between unconvincing and completely unacceptable. For example, if graphics go astray or audio isn’t synced with the visual presentation, the entire experience can deteriorate into a chaotic mess. Any latency or stutter decouples the physics of XR from the physics of the natural world. In fact, research indicates that even a delay as tiny as 50 milliseconds is noticeable in a virtual environment.1

Within the realm of AR, things can be just as disconcerting. Hovering a phone over a menu and watching as a translator app converts French into English or German into Mandarin Chinese can lead to a few choice words if the app displays incorrect letters or words—or produces complete gibberish. Pointing a smartphone camera at a room to view what a sofa or desk will look like in the room can devolve into sheer comedy if the item floats around the room or is out of proportion. It’s not good enough to simply display the object, it has to anchor in the right place and appear reasonably close to what it would look like in the room. Size, scale, colors, and perspective are all critical.

Make no mistake, orchestrating and applying a mix of digital technologies are paramount. Yet assembling systems and apps that function correctly also requires attention to other areas, including human factors and physiology, psychology, sociology, and anthropology. A person wearing AR glasses or inside a virtual world must receive appropriate cues and signals—essentially metaphors for how to interact. Part of the challenge for app designers is how to incorporate situational awareness into a continuous feedback loop. Essentially, the system must recognize and understand what a human is doing at any given instant and the human must recognize and understand what the system is doing. If a failure occurs on either side of this process, the extended reality experience will not measure up.

A common belief is that it’s essential for virtual reality and other forms of XR to produce an experience as real as the physical world, but this isn’t necessarily true. In most cases, the objective is to create an environment that appears real enough to trigger desired responses in the mind and body. This includes the illusion of movement or the sensation of hearing or touching an object. It isn’t necessary, or even desirable, to duplicate the physical world in every way. In much the same way a person enjoys a motion picture while knowing it isn’t totally authentic, an individual can suspend disbelief enough to accept the virtual-reality or augmented-reality framework as real enough.

In order to achieve a balance point, those who design and build today’s XR frameworks—particularly virtual-reality software and systems—must make tradeoffs between fidelity and immediacy. Greater pixel resolution and better graphics increase the demands placed on the system to render images quickly and accurately. If the underlying computing platform can’t keep up with the processing demands—as was a common problem with early VR gaming platforms—the technology unravels, performance suffers, and the magic of XR ceases to exist.

A common belief is that it’s essential for virtual reality and other forms of XR to produce an experience as real as the physical world, but this isn’t necessarily the case. In most cases, the goal is to create an environment that appears real enough to trigger desired responses in the mind and body.

Yet, even if engineers and developers could produce an ultrarealistic virtual world, they probably wouldn’t want to do so. The human mind and body—sensing a virtual situation is real—could wind up in a red zone. A person inside a virtual-reality environment who becomes overwhelmed might suffer a panic attack, extreme fear, motion sickness, or, in a worst-case scenario, a heart attack or other physical problem that could lead to severe illness or death.

The need to balance these factors—and embed them in the algorithms that run the software—is at the core of AR, VR, and MR. A system must avoid taking people beyond their physical and mental limits. What’s more, in many cases, the VR system must adapt and adjust dynamically, based on the user’s motion and responses. In the future, extended-reality systems will likely include an array of sensors and biofeedback mechanisms, possibly built into gloves and clothing, that indicate when a person is approaching overstimulation. By monitoring heartrate, perspiration, and even brain waves, systems designers can build a more realistic yet safe environment. They can offer cutoff or failsafe triggers that allow a person to enjoy the experience with minimal risk.

Perception Is Everything

It’s not surprising that we take our body movements, along with the movements of the world around us, for granted. From the moment we’re born into this world we use our legs, feet, arms, hands, and sensory organs—including our eyes, ears, and mouth—to experience and navigate the physical spaces that surround us. As we grow from babies to toddlers and from children to adults, we learn the precise movements that allow us to walk, run, toss a ball, eat food, ride a bike, and drive a car. At any given moment, the combination of these calculations in our brain—including sight, sound, smell, touch, and taste—instructs our body what to do. We’re essentially performing complex trigonometry without knowing it.

Proprioception refers to how we sense an environment and move our trunk and limbs accordingly. This mental and physical framework—essentially a hardwired algorithm that is largely the same for all people—allows us to do all sorts of things: eat food without seeing our mouth or even looking at the utensil; apply the right pressure to a knife while slicing carrots; pick up a pen and sign our name while blindfolded; and drive a car while changing a radio station or reaching for an item in the back seat. This “position sense,” which involves the entire nervous system, allows us to handle an array of tasks “that would be otherwise impossible. The sense is so fundamental to our functioning that we take its existence for granted,” according to Sajid Surve, a doctor of osteopathy who specializes in musculoskeletal medicine.2

Of course, during any activity in the physical or virtual worlds, our brain continuously absorbs the stimulus that is presented to it. More than 100 billion neurons and one billion synapses in the human brain continuously monitor the environment and process sensory data—collected by the eyes, ears, nose, skin, and mouth. Joseph Jerald, cofounder and principal consultant at NextGen Interactions and a member of the adjunct faculty at Duke University, points out that we “don’t perceive with our hands, ears, and hands—we perceive with our brains. Our perception of the real (or virtual) world is not reality or ‘the truth’—our perceptions are an interpretation of reality. ... The way to think of the situation is that objects don’t actually poke into our retinas,” he points out. “We instead perceive the photons bouncing off those objects into our eyes.”3

In fact, the intersection of psychology and physiology is a critical element in VR. Studies show that humans do not perceive the physical world accurately. Frank Steinicke, a professor of Human–Computer Interaction at the Department of Informatics at the University of Hamburg, has found that people do not necessarily follow the path they think they are following in the physical world.4 This becomes painfully apparent when people get lost and wander through forests and deserts—but hardly cover any ground.5 In fact, with a blindfold or in a virtual environment they believe they are walking in a straight path when they are actually curving and meandering. Without feedback, people drift and cannot retrace their path. They might also believe they are walking a great distance when they are not, and they may become confused by their surroundings, even when things look familiar.

App designers must take these types of misperceptions into account when they create virtual worlds—and even augmented systems. In some cases, these spaces require virtual warping and other techniques to create the appearance of reality. This also means that with the right design and the right equipment—such as an omnidirectional treadmill, a holodeck, or a device such as the Virtusphere—a virtual environment can take up a lot less space than the virtual representation of space it encompasses. The illusion is similar to a movie set with backdrops that appear to stretch the scene into the distance when the physical boundary actually stops only a short distance away.

Of course, there are many other elements that designers and app builders must get right—for example, the way the sun shines and casts shadows, the way trees grow, or the way the horizon looks are keys to creating realism. They might be altered, but only to create a fantastical imaginary world. Yet there’s a risk: if the look of a virtual space interferes with the basic physics and realities of real-world perception the space may produce disorientation, motion sickness, and cognitive distress. So, while designers can manipulate the human mind in some ways, they must also be respectful of where human limits exist. It may be possible to warp some angles and distort certain things. but taking too many liberties is a sure route to failure.

A Focus on Feelings

Ensuring that a virtual space operates correctly in the physical world is a daunting task. A great deal of complexity surrounds the way our bodies sense things—and ourselves. That’s why a person who suffers a stroke or other impairment often finds previously simple tasks extraordinarily difficult. This individual must use new neural connections to relearn the task. In order to achieve convincing results in the AR and VR spaces, it’s necessary for computers to process mountains of information and transform it into an interface and actions that seem natural. This requires software code—built into complex models—that allows the body to interact with the computer in real time using HMDs, haptics systems, and other technologies that create a virtual setting. Think of this as a digital form of proprioception.

Yet, despite enormous advances in VR, today’s systems remain fairly primitive. Most virtual-reality designers focus on creating compelling visuals, while other sensory input such as audio and touch remain an afterthought—or wind up completely ignored. The absence of a complete array of sensory signals, however, and the resulting stimuli to the brain, essentially the framework that creates human proprioception, will likely lead to a mismatch between sensory input and output. This gap typically creates an unsatisfying, if not completely unpleasant, virtual experience.

Numerous perceptual psychology components apply to VR, Duke University’s Jerald points out.6 These encompass an array of factors, including objective vs subjective reality; distal and proximal stimuli; sensation vs perception; the subconscious and conscious mind; visceral, behavioral, reflective, and emotional processes; neurolinguistic programming; space, time, and motion; and attention patterns along with adapting methods. In a practical sense, all of these factors intersect with augmented- and virtual-reality technologies through buttons, eye movements and directional gaze, gestures, speech, and other body motions, including the ability to move around inside a virtual space.

While the complexity of all this isn’t lost on anyone, the intersection of these various issues and components makes it more difficult to produce a compelling virtual space. An outstanding perceptual model isn’t enough. Any environment must change in order to accommodate a nearly infinite stream of actions, behaviors, and events that occurs within a virtual application. What’s more, people sometimes act and react in unpredictable ways. Translating all of this into an algorithm is vital. The task is further complicated by existing beliefs, memories, and experiences, which vary from one person to another and across cultures.

Finally, it’s critical to build powerful graphics processing and display capabilities into an HMD. The software must make sense of all the motion and activity—for both the computer-generated virtual world and the person using it in the physical world—and generate a convincing virtual space. What made the Oculus Rift a breakthrough, for instance, was the use of OLED panels for each eye—operating at a refresh rate of 90 hz with ultralow persistence. This meant that any given image appears for two milliseconds per frame. In practical terms, this translates into images that display minimal blur with extremely low latency. The end result is a more convincing and immersive environment.

Although augmented reality doesn’t present the same magnitude of technical or practical challenges, it isn’t exempt from the laws of physics and human behavior. There’s still a need to create a realistic experience that appeals to the senses—and many of the same psychological and physical components apply. What’s more, depending on the display technology, the specific application and what it is attempting to do, there’s a need to factor in proprioception and how people perceive the augmented space. For example, visual clutter or a distracting interface may render the experience chaotic, confusing or dangerous.

Interaction Matters

Understanding how people act and react in an augmented or virtual space is critical. There’s a need to discern interaction patterns at a detailed level. This data not only helps create a better experience, it offers valuable information about how to create an interface and maximize usability. A virtual-reality system may allow people to select objects and things through physical controls using buttons and other input systems; hand, arm, and full body gestures; voice and speech controls; and virtual selection systems, such as touching virtual objects inside the virtual space. This, in turn, may require tools, widgets, and virtual menus. Regardless of the specific approach, the computer must sense what the person is doing through sensors and feedback systems.

Interface challenges extend beyond user controls. It’s also necessary to deliver viewpoint controls. This may include activities like walking, running, swimming, driving, and flying. Navigating and steering through these activities may require multitouch systems and automated tools that use sensors and software to detect what a user is doing at any given moment—or what he or she is intending to do. Consequently, designers must consider everything from the viewpoint of the person using an app to the relationship between objects and things—and place the user at the appropriate spot in the activity. In some cases, the user may control the environment while in other cases the software may aid or manage things in response to what the user is doing—or prompt the participant to change his or her behavior.

The practical result is a need for different types of input. What’s more, it may be necessary to flip between input methods. For example, at the start of a virtual activity a person might use a physical button to choose an action. Later, he or she might use a virtual menu. At another point, the user might rely on a gesture or haptic feedback to simulate picking up or tossing an object. Along the way, there might also be a need for voice commands. For instance, someone might place an item in a shopping basket by touching it in the virtual space but later decide to put it back. At that point it’s probably easier to say “remove” or “delete” along with the name of the item. Of course, the computer and software must react to this continuous stream of inputs and commands.

As virtual environments evolve, the tasks becoming increasingly complex—but also increasingly fascinating. One key question is how to address the challenges of locomotion. It might be possible to take a step or two in any direction, but beyond that there’s a risk of bumping into physical objects, including furniture, walls, and other people. Coordinating walking or other movements within the virtual environment can involve both physical and virtual tricks. Omnidirectional treadmills and holodecks are increasingly valuable tools. But designers might also rely on teleportation techniques, such as a bird or aircraft, that transport a user from one place to another inside the virtual world. They might also include other visual and sensory tricks to create an illusion—and feeling—of movement when there is no actual movement.

Another challenge is to create the illusion that an event is taking place—say, flying an airplane or traveling in a spaceship—but avoid the sensory overload of actual g-forces (the pressures created by acceleration and rapid motions). The need for a muted experience is obvious. This technique helps reduce sensory conflicts that lead to dizziness or motion sickness. Engineers have a name for this concept: rest frames. It refers to a pause or stationery moment that helps the user gain a sense of spatial perception. For instance, a race-car driver or spaceship pilot would normally feel the vehicle rattling, vibrating, and pitching as it moves at high speed. Inside the virtual world, the car or spacecraft doesn’t move. Instead, the outside environment flashes by at the speed a race car or spacecraft would travel while the participant receives other sensory stimulation, such as vibrations and sound.

Within a virtual world, it may be necessary to create the illusion an event is taking place but avoid the sensory overload of actual g-forces and motions. The need for a muted experience is obvious. This may help reduce sensory conflicts that lead to dizziness or motion sickness inside a virtual world.

Rest frames rely on a basic but important principle: they combine a dynamic field of view with a static field of view. In a practical sense, this means that parts of the display are in motion while other parts are not. By making objects fade in certain frames, it’s possible to trick a user’s mind. Not surprisingly, there’s a delicate line between too much and too little stimulation in a VR space. Users, and particularly those playing games, desire an adrenalin-provoking experience. What’s more, teleporting, while valuable for certain situations, isn’t always desirable, because it reduces or eliminates part of the virtual world. Dynamic rest frames create a sense of balance because they deliver the excitement and immersion of the activity without creating mental and physiological overload.

An added benefit of dynamic rest frames is that they can adjust to the user’s movements and inputs dynamically. Because of this, they can be used to direct focus in different directions. This is especially valuable for cuing users about which way to look or how to navigate a space. Yet, this isn’t the only visual and sensory trick VR developers use. They also rely on a simpler method called shrinking field of view, which involves focusing more narrowly on what the participant sees. The technique is widely used in games and it appears in old silent movies. In some cases, developers may use both techniques, though dynamic rest frames deliver a far greater feeling of stability.

A Touch of Reality

Another key to creating a realistic virtual experience is haptics. Producing a virtual replica of an actual tactile event is no small task. The human body experiences the sense of feel in complex ways. Nerve endings in the skin feed information about pressure, heat, and other sensations to the brain, which instantaneously decides what the feeling is and what it means. This helps humans understand how to act and react within a physical environment, such as when a coffee cup is too hot to hold, or a silk fabric caresses the skin. To be sure, these sensations can lead to pleasure or pain. This feedback loop is referred to as the somatic sensory system.

A haptics system must take numerous factors into account. The human hand consists of 27 bones connected by joints, tendons, muscles, and skin. This physiological structure delivers what is known as 27 degrees of freedom (DOF). This includes four in each finger, three for extension and flexion, and one for abduction and adduction; the thumb has five DOF, and there are six DOF for the rotation and translation of the wrist.7 Consequently, humans can press, grasp, pinch, squeeze, and stroke objects. When a person touches or holds an item, two basic types of response occur: tactile, the basic sense of contact with the object, and kinesthetic, which involves the sense of motion along with pressure. As the nervous system transmits signals to the brain, a person constantly adjusts to handle the task at hand.

Mapping this level of functionality to a digital environment requires sensors, artificial muscles and tendons, and sophisticated software. Joysticks and other basic computing input devices cannot deliver more than two or three DOF. Other haptic devices that have emerged from research labs have improved the DOF factor to six and occasionally seven. By contrast, HaptX technology delivers tracking accuracy to about a third of a millimeter at each finger with six DOF per digit, 36 DOF total in the hand. This means that a person using the system can engage in a wide array of tasks that involve detailed motions at virtually any axis.

It’s not necessary to duplicate the exact physiology of the hand. “The system tricks the brain into thinking you’re touching something real,” HaptX CEO Jake Rubin says. The technology is much more precise than systems using motorized components. In the future, the technology could be built into full body suits and exoskeletons that create sensation beyond the fingers and hands. Used with high-fidelity motion tracking and other systems, such as an omnidirectional treadmill or specialized chamber, a person could virtually experience activities from the physical world. This might include parachuting out of an airplane, deep sea diving, or rock climbing up a sheer cliff.

A Sense of Scent and Taste

The quest to create realistic virtual reality also leads to scent and flavor. This might include an ability to smell or taste a dish at a restaurant before setting foot in the eatery or training a HAZMAT worker to identify potentially dangerous fumes. It might also allow a travel agency to create a virtual lavender field in Provence or a patisserie in Paris so that potential travelers can gain a sense of what the actual experience might be like. Of course, by combining scent and taste with sight and touch it’s possible to reach a major goal of virtual reality: a completely immersive and convincing experience.

Technologies that produce virtual scent are advancing. These include specialized oils, chemicals, and pellets that are typically activated by pressure or electrical signals and released into a scent chamber.8 Stanford University’s Virtual Human Interaction Lab, for example, has concocted virtual doughnuts that look and smell real.9 A company called FeelReal has built a sensory mask that produces scents while also mimicking other sensory stimuli, including heat, mist from water, vibration, and the feeling of wind. The company touts an ability to generate the “smell of gunpowder” while playing a VR game, or an ability to “smell flowers” while roaming through a 3D virtual garden. The wireless sensory mask, which fits over head-mounted displays from Oculus Rift, Sony, and Samsung, generates other scents from a base of chemicals. This includes a jungle scent, burning rubber, fire, and the ocean.10

Researchers are also looking to incorporate taste into virtual environments. For example, at the National University of Singapore, a group of researchers has developed a digital lollipop that can emulate different tastes.11 The goal, according to research fellow Nimesha Ranasinghe,12 is to build a system that allows people to taste a recipe they see on a cooking show on TV or on the internet. The technology uses semiconductors to manage alternating current and deliver slight changes in temperature to a user. This approach fools the taste receptors in the mouth—or more specifically the brain—into believing that an electrical sensation is a taste. The same research group is also exploring ways to transmit tastes over the internet using electrodes.

Out of the Lab and into the Virtual World

The real-world challenges of putting all the pieces together and creating immersive lifelike applications are enormous. Virtual objects must be able to move but also rotate, scale, and change positions. What’s more, a user must be able to select objects and do things with them. This requires sophisticated controllers as well as software that can manage the data stream and render the environment correctly. While a realistic experience is important for a computer game, it’s nothing less than critical for training a police officer how to diffuse bombs or helping a surgeon understand exactly how perform the cuts required for an operation.

A trio of researchers at Walt Disney Imagineering noted in a 2015 academic paper that the potential of extended reality is inhibited by the sheer number of factors that must be addressed. This includes “the lack of suitable menu and system controls, inability to perform precise manipulations, lack of numeric input, challenges with ergonomics, and difficulties with maintaining user focus and preserving immersion.”13 The goal of system designers, they noted, “is to develop interaction techniques that support the richness and complexity required to build complex 3D models, yet minimize expenditure of user energy and maximize user comfort.”

Part of the problem is that commercial off-the-shelf and widely used desktop modeling tools for VR, such as Maya and SketchUp, do not provide the wealth of spatial information required to create robust virtual environments. Some tools use a 2D interface to create a 3D space. Further complicating matters, while there is plenty of evidence that 3D interaction offers a better way to build 3D virtual environments, most software in this category lacks the robustness of 2D toolkits. It’s a bit like trying to draw a detailed portrait of a person in a computer using a mouse rather than a digital pen. Lacking the right tool, it’s impossible to achieve the level of detail that makes the portrait a good piece of art.

There is another fundamental challenge: legacy input devices—including game controllers, wands, and other devices with buttons, triggers, and a joystick—were never designed to operate in a 3D virtual world. This inhibits—and often limits—the way input and design take place. The end result is a VR environment that is somewhat awkward. “This can limit the expressiveness of user interaction due to the simplistic nature of these types of inputs while simultaneously complicating the interface due to the large number of inputs and the need for the user to remember complex functional mappings,” the Disney researchers wrote.

The Disney team has focused its attention on developing a hybrid controller that collocates a touch display, offers a casing with physical buttons, and delivers the ideal of six degrees of freedom. They have also focused on developing more robust software that allows a user to select tools and widgets within the virtual space. Their stated goal is to produce a navigation system—complete with touch screens and floating virtual menus—that enables interaction across a spectrum of screens and displays, from desktop PCs to head-mounted displays and CAVE environments.

Others, such as the startup Nuerable, are working to advance brain–computer interfaces to the point where the computer uses direct neurofeedback from the brain to eliminate controls altogether. This requires a greater understanding of brain signals—and an ability to reduce the noise from the brain and in electronic systems—in order to produce a system that can operate in real time. “We believe that science has reached the early stages of a neurotechnology revolution that will eventually bring BCIs to everyday life. Neurotechnology holds tremendous potential to assist and augment human cognitive functions across a wide variety of tasks,” the company says on its website.14

Still other researchers and other commercial firms are exploring completely different ways to take virtual reality to a more realistic level. Google has experimented with light-field photography—a system that captures all the rays of light from a scene rather than only what passes directly through a camera lens—to produce more realistic virtual graphics.15 The system uses a 16-camera circular rig with GoPro cameras positioned in a semicircle to capture more dimensional data about the image than a single camera can achieve. The goal is to produce “beautiful scenes in stereoscopic 360 VR video,” Google noted. These sights, whether they depict the Incan ruins of Machu Pichu or the International Space Station, can be viewed on high-end HMDs as well as inexpensive Google Cardboard displays.

Virtually There

In an ideal virtual world, a person picks up an AR or VR application and uses it without a manual or any guidance. The interface is intuitive while the computer graphics are completely believable. This goal is the same whether the representation is a realistic place or a fantasy world that could never exist in the physical world. For app developers and digital artists, this may require new plants, animals, and creatures; introducing different types of avatars; and producing entirely new virtual settings and ways to move around that defy the physics of the real world—flying or swimming under the ocean, for example—while adhering to the physics of movement. Digital artists must also think about how to focus attention and redirect the brain and senses through light, patterns, and sensory cues.

Virtual environments ultimately hinge on two critical factors: depth of information and breadth of information. This framework, introduced by computer scientist and communications expert Jonathan Steuer, revolves around the idea of maximizing immersion and interactivity.16 Depth of information refers to the richness of the environment as defined by display graphics, resolution, audio quality, and overall integration of technologies. The breadth of the environment involves the spectrum of senses engaged by the overall VR technology platform.

In practical terms, this means that an AR or VR application that operates accurately 90 percent of the time isn’t good enough. A character that doesn’t respond correctly even one out of five times in a game, or one that periodically disappears from view for a few seconds, is enough to undermine the entire experience. Likewise, an augmented overlay that serves up a hodgepodge of data—or the wrong graphics and information—may confuse an engineer or technician and lead to an error. Getting to the 100 percent level in performance, usability, and plausibility, however, extends beyond current technology and knowledge. For now, the goal is to create a space or app that’s simply plausible. If a person can suspend his or her sense of belief, the application works.

Of course, advances in digital technology guarantee that future systems will be far more sophisticated—and components will be more tightly and seamlessly integrated. Pixel by pixel, frame by frame, and app by app, augmented reality, virtual reality, and mixed reality are changing the world. In fact, these virtual technologies are already reshaping almost every industry and sector—from manufacturing to entertainment and from engineering to medicine. Says Marc Carrel-Billiard, global senior managing director of Accenture Labs: “The world is moving from a flat screen to immersive 3D.”