Design Computing

A task completed using a computer will generally involve at least three models: one for the data that describes the task; one for the user actions that can be carried out on that data; and one that establishes the cultural context in which it is carried out. The data model determines what is physically possible through the selection of parts and their relationships, the same way that different vehicles (car, bicycle or truck) emerge from choices of frame, axles, tires, power train, and power source. The data model determines, to a great extent, the affordances of the resulting vehicle (how much it can carry, how you steer it), and those affordances establish an interface between the mechanism and the user. Finally, the user’s goals and the vehicle’s affordances interact with the cultural context—the rules of the road—to further establish whether and how the task can be accomplished.

We don’t have to be a car designer or a mechanic to drive, but we do need to know where the tires go, how to add fuel, and when to lubricate it; we learn how to steer, brake, use a manual transmission, parallel park, and change a tire, as well as how fast to go and how to signal a turn. We understand the fundamental components and their relationships to each other. For cars, that produces a certain mechanical literacy; for computers it is often called “computational thinking” and is the subject of this chapter. Because many users learn to compute in a largely task-oriented mode, even experienced and capable users may have incomplete understanding of the system they use, so the focus here is on the low-level model, the mechanics of the system.

Virtuality is Real

The different programs you run on your computer control its information-processing behavior, allowing it to switch from being a web browser or CAD program to being a spreadsheet or video game. This characteristic of acting like or presenting the behavior of something else is referred to as virtuality. It’s as if all of our kitchen appliances (stove, fridge, sink) were replaced with one piece of hardware that delivers each of these functions when requested.

The computer itself consists of sources of input (including mice, keyboards, cameras, disk drives, and network interfaces, as well as cell phone radios, accelerometers and temperature sensors), the CPU for manipulating the information, short-term memory and long-term disk storage for the information, and a means of communicating information to the outside world (including a display screen, speakers, printer, and network interfaces). The CPU does the actual computation, depending on the instructions in the programs. Information (data) and instructions (software) are both stored in memory during use, where they are quickly accessible by the CPU. The main memory is fairly expensive and volatile (it forgets everything if power is lost), so it is supplemented with non-volatile secondary storage that does not require electricity to retain stored information. Until quite recently rotating magnetic surfaces were used for this, using various mechanisms ranging from cheap low-capacity “floppy” disks to fast high-capacity “hard drives.” More recently flash memory thumb drives and solid-state disks made from computer chips have begun to take over this role, though they continue to be referred to using the old vocabulary, and play the same role in the machine.

When the machine is first turned on, main memory is empty except for a simple program that is actually built into a small area of memory called read-only memory (ROM). On power-up the hardware automatically accesses and runs the instructions in ROM. In a process called booting (derived from the phrase “to pull yourself up by your bootstraps”) this program retrieves the operating system software from the secondary storage and loads it into memory for active use. The operating system is the software bureaucracy with which you interact and against which all your other applications/programs execute.

When you start up (launch) a program the operating system copies (loads) its instructions from the secondary storage into an unused area of memory and the machine begins executing those instructions. The program may, in turn, copy a data file into other unused memory (opening and reading the file), connect to another computer over a network in order to retrieve a file, or begin creating a new file. Because it is a relatively slow process, changes are usually not recorded to the hard disk until you explicitly save your work, so program failures (crashes) may obliterate whatever work you have done since the last save. When the program is done it gives control of the machine back to the operating system, which recycles that memory for the next task.

Computer Memory is Lumpy

In digital computers, everything is represented using bits, shorthand for “binary digits.” Each bit can take one of two states or values. In the early years (1950s to 1970s), when memory was literally magnetic, bits were actually stored as N or S magnetism by pulsing a current through an iron ring. Similarly, punch cards and paper tape either had a hole or no hole in a particular spot. In modern systems bits take the form of high or low voltages sustained in tiny circuits etched by the millions onto chips. Whatever the physical mechanism, each bit is unambiguously one value or the other. There is no “in between” value. Reflecting this, we usually refer to the individual values with dichotomies like “True/False,” “on/off,” or “1/0.”

As the above list of dichotomies suggests, a bit doesn’t actually mean anything by itself, but can be used to store one of two values. The meaning arises from the context in which they are used. That is, they must be interpreted, a little like a secret code. This explains why you usually need the same program that created a file in order to change the file. The exceptions arise when the code is widely known, allowing different programs to manipulate the data in the file, as is the case with a plain text file.

Since individual bits store so little info, most of the time we use bits in groups. An 8-bit group is called a byte. (Humorous fact: a 4-bit group is called a nibble.) The range of meaning possible in a group depends on the number of bits used together. Figure 4.1 uses groups of four circles to illustrate the 16 unique on/off patterns a nibble may have. If you look at it carefully, you’ll see that the pattern of filled circles also follows a familiar pattern—starting at the right side, count from 0 to 1; when you run out of digits carry over 1 to the next place (bit) to the left. Repeat. This is counting with binary numbers. It works just like counting with decimal numbers, except binary numbers have a 1s place, a 2s place, a 4s place, and an 8s place, each allowed to hold only a 0 or a 1, rather than the more familiar 1s place, 10s place and so on with digits 0 to 9. Arithmetic doesn’t change, just the representation: Look carefully and you’ll find that 2 + 5 = 7 here too.

Figure 4.1 A few bit patterns and their meanings as integers or text.

Unfortunately, while more bits can store more information, programs are usually built with fixed-size assumptions about the most demanding situation they will encounter—the “biggest number,” “number of colors,” “maximum number of rooms,” etc. These numbers reflect the underlying use of bits to store the information (16 values with 4 bits, 256 with 8, 16 million with 24 bits, etc.). The progression includes 2¹⁰ = 1024, which is so close to 1000 that the pseudo-metric notation of a KB (kilobyte) is often used for measuring file size, transfer speeds, etc. Some hardware limits reveal their binary nature through phrases like “32-bit color” displays, “10-bit analogue to digital conversion,” etc. If the programmer of your social networking site only allocated 8 bits to store your age, you can’t get older than 255! At this time this isn’t a problem, but it is worth noting that Microsoft built the original MS-DOS operating system such that the amount of memory a program could talk to directly was only 640 KB and it became a major headache as hardware grew more capable. Today’s laptops have about 10,000 times this amount of RAM.

Computer memory is usually accessed, and contents moved or updated, in 8-bit bytes or in multiples of 4 or 8 bytes (32 or 64 bits) at a time. These bigger groupings, called words, allow the machine to count to some pretty big numbers, but they’re still limited and literal. Humans might accept the notion that the ratio 1/x approaches something called infinity as x approaches zero, but computers will crash if presented with such information (it’s called overflow, meaning the number is too big to be represented with the bits assigned to it) because the concept of infinity is an abstraction. It doesn’t have a representation in bits.

Standard Representations

Sequences, functions, loops, and conditionals describe the actions or verbs of computer programs. The nouns are found in the patterns of bits stored in memory. Certain standard representations have emerged over the years and are shared by most computers. Standard representations are essential to the sharing of data between computers produced by different manufacturers and between different programs within a single computer, and they are the fundamental elements out of which more complex representations are built. They make it easier for programmers to move from job to job and for program source code to be ported between differing systems. The representations simply define how to store numbers and text, but there are important subdivisions within those as well.

The existing standards are the product of government and industry experts working together over a period of time. New representations rarely appear as new standards right away. Intellectual property rights and proprietary commercial interests protect them. However, where a single corporate entity dominates a market, such representations often become de facto standards over time as other software developers license or reverse engineer their behavior. Examples include PDF (by Adobe) and DWG (by Autodesk).

Numbers

The most direct representation possible in memory is that of a counting number, where each bit in memory represents a 0 or 1, and a collection of bits represents a multi-digit number. The resulting binary numbers have a 1s place, 2s place, 4s place, etc. (each time you add a bit you double the number of unique ways you can configure the collection of bits). An 8-bit byte can store 256 patterns (2 raised to the 8th power), such as the counting numbers in the range of 0 to 255. Two bytes (16 bits) get you from 0 to 65,535.

Negative numbers require a slightly different representation. The positive or negative sign of a number is a duality like true and false, so it is easy to see that if we dedicate one bit to remembering the sign of the number we can use the rest of the bits for normal counting. While preserving the number of unique representations, they are divided between the positive and negative portions of the number line. One such scheme in common use is two’s complement encoding under which an 8-bit byte can store a “signed” value between –128 and +127 or an “unsigned” value (always positive) between 0 and 255.

The positive and negative counting numbers are called integers. Such numbers have exact representations in computer memory. The range of values they can have depends on the amount of memory allocated to store each one.

A chicken counting her eggs could use an integer, but an astronomer figuring the distance to the nearest star needs a different representation because it takes too many digits to record astronomical distances even if they are measured in kilometers rather than inches. That’s why scientific or exponential notation was invented! Using this scheme the distance to Alpha Centauri can be written as 4.367 light years, and since a light year is 9.4605284 × 10¹⁵ meters, that makes a total of 4.13 × 10¹⁶ or 413 × 10¹⁴ meters after rounding off a little. Interestingly, that last way of writing it uses two much smaller integers (413 and 14), at the expense of rounding off or approximating the value. Using negative exponents we can also represent very small numbers. Because these kinds of numbers have a decimal point in them, they’re generally called floating point or real numbers. Some of the bits store the exponent and some store the significant digits of the numeric part (called the significand). In a 32-bit single-precision floating point number, 24 of the bits will be used for the significand, and 8 bits will be used for the exponent. That translates into around seven significant digits in a number ranging in value from –10³⁸ to 10³⁸. To do math with real numbers is a bit more complicated than doing math with integers, but dedicated hardware takes care of the details. Note that while they can represent very large or very small values, floating point numbers are necessarily approximate.

There is a difference between a number (e.g., 12) and a measurement (12 meters) that includes a unit of measure, because the standards for computer representation only cover the number. The number can represent a measurement in any unit of measure. The standards guarantee that numbers shared between programs represent the same values, but there is nothing in the standard representations that says what units the values are measured in. Unfortunately, the units of measure are often implicit in the software (CAD programs in the United States default to inches or sometimes feet), or left up to the user (as in a spreadsheet). In 1999 the simple failure to convert a measurement from imperial to metric (SI) units when sharing data within an international design team led to the loss of the multi-million dollar Mars Climate satellite (Hotz 1999).

Because units are usually implicit in CAD software, and exchange formats generally transfer numeric coordinates rather than unit-oriented dimensions, architects sometimes find that drawings exported by a consultant need to be scaled up or down by 12 (to convert between feet and inches), or 1000 (to convert between meters and millimeters), or by 2.54 (to get from inches to centimeters or back).

The Fuzzy Edges of Floating Point Coordinate Systems

The important point about the range of a representation is that it cannot capture the full splendor of reality. If you establish a coordinate system for our solar system (centered on the sun) using single-precision real-number coordinates measured in inches, coordinates are precise to about eight digits. For points near the sun, that’s accurate to a tiny fraction of an inch, but for positions 93 million miles (5.9 × 10¹² inches) away from the origin, say on the surface of the Earth, that means points 10⁴ inches (833 feet, 40 meters) apart have indistinguishable coordinates. Not a big problem for architects who usually pick an origin closer to their projects, but a challenge for space scientists and molecular engineers. Larger memory allocations (64 bits per number rather than 32) reduce the problem, but it never completely goes away.

Which is Best? Integers or Floating Point Numbers?

Integers are generally faster to compute with, but not useful for things like spatial coordinates, because we often need fractional coordinates, and we may need big coordinates.

The curious thing in this situation is that floating point numbers, which allow us the most precise-appearing representation (e.g., “3.1415926” for pi, or “1.0”) are using an internal representation that is an approximation to the value, while the numbers we associate with approximation (e.g., “37 feet” or “12 meters”) are actually exact representations (if you rounded off the measurement when you wrote it down, that approximation is on you—the internal representation of the number is exact).

By convention, integers generally appear without a decimal point (so “37.0” is a floating point number, while “37” is an integer), but this is not a requirement.

Text

Humans write text in many ways—left-to-right, right-to-left, top-to-bottom—and we use many different symbol systems, some with a handful of unique symbols, like American English, and some with thousands, like Japanese Kanji. This complicates the creation and entry of text in ways that Western Europeans and their American cousins don’t always appreciate, but since American companies dominated computing in the 1980s we started out doing it their way. To simplify things and make it manageable, the early systems ignored all text except left-to-right, top-to-bottom. This reduced the complexity down to just two problems, analogous to typing: picking the individual characters and figuring out when to start a new line of output.

ASCII

Remember, bytes store patterns of on/off, so they don’t actually store a text character. The link between the pattern and the meaning is one we create. It is arbitrary, but it can be rational. To simplify the problem, early developers ignored much punctuation and many letters. Given a list of acceptable characters, they created mappings from on/off patterns of bits in memory to the selected characters—very much like setting up a secret code. A few of these, from the American Standard Code for Information Interchange (ASCII, pronounced “ass-key”), are shown in Figure 4.1, where you might note that “A” and “a” are not the same character, and that interpreting the bits of “B” as an integer yields a larger number than “A”—a fact that helps with alphabetizing.

The appearance of the letters when printed depended entirely on the printer—there was no such thing as a font. Even worse, different companies (DEC, IBM, CDC) came up with different mappings, so an “!” (in EBCDIC) on an IBM mainframe had the same bit pattern as a “Z” (in ASCII) on a DEC mini-computer. This significantly complicated the process of copying files from one computer to another, but since much data stayed in one place it wasn’t a huge problem. When the PC revolution began and people wanted data on their PC that started out on their company mainframe computer, the incompatibilities between systems became both noticeable and painful.

The rise of the PC saw the demise of all the encoding schemes except ASCII. ASCII rigidly defines 128 characters, including both upper-case and lower-case letters, numerals, and a variety of punctuation, plus some characters like “form-feed” (to start a new page), “line-feed” (move down a line), and “carriage return” (move back to the start of the line).

ASCII uses an 8-bit byte to store each character, so there are 256 possible characters, but the standard only defined the first half of them. The eighth bit was originally set aside to provide a way to validate character transmission between systems, but robust hardware rendered that use unnecessary and vendors began to use the “upper 128” characters for other things. Different vendors (IBM, Microsoft, Apple, etc.) filled in the missing characters but not in the same way, so while ASCII is a standard, it is a flawed standard in practice. The flaws appear most obviously today when you use certain characters in email or blog posts, characters such as curly (or smart) quotes, accented characters (éøü), em-dash (—), etc. The vacuum of undefined space had some positive effects too; in the 1980s those upper 128 characters allowed researchers at the University of Hawai‘i to develop a script system for recording Pacific Northwest Native American languages and stories, helping preserve a language that was rapidly disappearing (Hsu 1985). But ASCII is no good if you want to use a writing system from the Middle East or the Far East. Those users literally could not write things down in a familiar and standardized script on a computer until unicode came along.

Unicode

Unicode was created to address the shortcomings of ASCII and its various extensions. The goal was to provide a unique character encoding for every writing system in the world (Unicode 2016). Because memory and storage space have been scarce resources, and in order to make the new standard backwards compatible with ASCII, unicode actually comes in three slightly different variations: using 1 byte (utf-8, very similar to ASCII), 2 bytes (utf-16), or 4 bytes per character (utf-32). Individual characters, called code points, are assigned in language-oriented groups.

End-of-Line

While unicode resolves the character representation problem, there is another problem with text files. The ASCII standard doesn’t specify how the end of a line is marked in a text file. With the development of word processors and automated word-wrap, this has morphed into a question about how the end of a paragraph is marked too. Unfortunately, three incompatible answers evolved in the industry:

Beginning with the observation that when you type on a typewriter you push the carriage-return lever or press the Return key, creating a “carriage return” (CR) character to start a new line, some systems use a CR to mark end-of-line.

Again observing the typewriter or printer, but noting that the start of the new line is both at the left edge and down one line on the page, some systems mark an end-of-line with a combination of a CR and a line-feed (LF) character, sometimes written CRLF.

Printing bold or underlined text used to require printing on the same line twice, which required doing a carriage-return without a line-feed. Such systems opted to mark the end-of-line with a single LF, which caused both physical transformations.

Microsoft Windows uses CRLF. Unix (and thus OS-X) uses LF. The original Apple Macintosh used CR. If you have ever opened a text file to find it unaccountably double-spaced, or with some odd invisible character at the start of every line after the first one, or where each paragraph is one very long line or the entire file turns into one giant block paragraph, you’ve encountered the side-effects of this problem. HTML rather brilliantly dodges the whole mess by lumping all three into the general category of “white space” and treating them the same as a space character.

Characters, Glyphs, and Fonts

The name of each letter (e.g., “upper-case A”) is distinct from the graphic mark (or glyph) with which it is printed. A collection of glyphs is called a font. In the early days text was printed by striking a character image against an inked ribbon, making what we now call “plain text”—all characters were the same width (ten characters per inch); 80 characters filled a line; and there were six lines per inch vertically. All upper-case “A” characters looked the same, but their font might vary from terminal to terminal or printer to printer, depending on the manufacturer, not the computer or file. In the 1980s the appearance of raster output devices such as dot-matrix and laser printers changed all that, giving us proportionally spaced text of many sizes and appearances that flows into variably spaced lines and paragraphs.

Even today, font data largely remains separate from the character data, and because fonts can be protected by copyright, not all fonts are available on all computers, which means exchanging files (both text documents and web pages) between operating systems can cause unexpected font substitutions. When an alternative font is used, lines of text usually turn out to have different lengths. This causes the edges of fully justified paragraphs to not line up; shifts text on slides and in notes or dimensions in drawings and illustrations, possibly causing overlaps with other line-work or graphics, etc. Even though fonts may now be attached to web pages and can be embedded in PDF files, intellectual property restrictions on some copyrighted fonts remain a challenge.

Numbers

The Fuzzy Edges of Floating Point Coordinate Systems

Which is Best? Integers or Floating Point Numbers?

Text

ASCII

Unicode

End-of-Line

Characters, Glyphs, and Fonts

4
Thinking Inside the Box

Virtuality is Real

Computer Memory is Lumpy

Computer Memory is Sequential

Code: Representing Thought

Types of Data

Variables and Constants

Pseudo-code

Assigning Values

Parameterized Computations

Defining Your Own Functions

Conditional Execution

Repetitive Execution

Alternative Programming Environments

Is That It?

Standard Representations

Files and Directories

Summary

Suggested Reading

References

4Thinking Inside the Box

Virtuality is Real

Computer Memory is Lumpy

Computer Memory is Sequential

Code: Representing Thought

Types of Data

Variables and Constants

Pseudo-code

Assigning Values

Parameterized Computations

Defining Your Own Functions

Conditional Execution

Repetitive Execution

Alternative Programming Environments

Is That It?

Standard Representations

Numbers

The Fuzzy Edges of Floating Point Coordinate Systems

Which is Best? Integers or Floating Point Numbers?

Text

ASCII

Unicode

End-of-Line

Characters, Glyphs, and Fonts

Files and Directories

Summary

Suggested Reading

References

4
Thinking Inside the Box