"Each thing we see hides something else we want to see." -Rene Magritte
Whether you are preparing a ten-page pamphlet or a 300-page book, the process of creating and producing an electronic document can be viewed in many different ways. Each software tool presents a particular conceptual model of the publishing process. This philosophical point of view greatly influences the functionality and usability of the software. The better you understand the many points of view, the more effective you will be in choosing and using the available software tools.
Some systems are page-oriented. Others focus on the entire document. Some are WYSIWYG (what you see is what you get). Others are language-oriented and still others are oriented toward on-line display and interaction. Learning and using the publishing tools are easier if you are aware of the philosophythe point of viewthat a system supports.
One way to understand the value of new technologies is to create a metaphor or catchy phrase for a concept. The iconic user interface of the Macintosh is known as a desktop. The use of printing software and hardware on this metaphorical desktop is known as desktop publishing. This usage brings to mind miniature Gutenberg presses right at your fingertips. Many varieties of desktop metaphors have been created: desktop machining, desktop forgery, desktop prepress, desktop broadcasting, and so on.
The newer technologies of electronic publishing also need new metaphors to cover the issues of document processing, electronic distribution, archival storage, and so on. (1)
The multifaceted world of electronic publishing needs a catchy phrase to describe it. The many points of view used to examine electronic publishing are necessary because there is no single satisfying metaphor.
The term electronic publishing means different things to different people. Many of the standards discussed in this book suggest other possibilities such as hypertext, on-line information browsers, and so on. These applicationsas well as databases, CD-ROMs, and other electronic repositoriesare all part of the domain of electronic publishing. At its core, electronic documents start as text organized into chunks of information, paragraphs, pages, and so on.
In this chapter, we examine many approaches to looking at electronic documents. The views we examine are (1) Visual and Logical Views, (2) The Design Point of View, (3) Communications Views, (4) The Engineered View, (5) The Database View, (6) Specialized Views, and (7) On-line Views. We examine how the creation of electronic documents is influenced by each point of view.
3 . 1 Visual and Logical Views
"It’s a small world, but I wouldn ’t want to paint it." - Steven Wright
Documents have many componentscharacters, words, paragraphs, chapter headings, sections, and subsections. We can examine each component in two complementary ways, the visual and the logical.
The logical aspect of a component refers to its semantically meaningful part, such as the fact that a collection of characters is a word that can be checked for spelling or that a chapter is divided into sections. The visual aspect of a document component refers to the size, position, and fonts used to form its physical appearance.The visual components of document elements will be discussed further in Section 3.2 The Design Point of View.
In this section, we examine document components of increasing complexity, starting with the character and progressing through to an entire enterprise. Each document component has a visual aspect and logical aspect; some lean more toward one than the other.
Putting these document components on a scale, from the simplest to the most complex,
You can manipulate each item on this scale using software tools. Of course, some tools cover several items on the scale. The orientation of a particular toolthe point of view it supportswill probably be centered around one particular item. In the following sections, we go through the scale by examining each document component individually.
provides us with a useful frame of reference in which to discuss these issues.
The first level of our document scale is the character and its manipulation. Characters, as logical meaningful entities, have values that are represented in the computer according to wellknown and established character codes. Character codes are the fundamental representation of text. ASCII is the best known and established character encoding.
Normally you don't have to be concerned about the character code used in your particular system. However, when you want to interchange to other systems, the character code may become a problem. In particular, interchange with systems in countries that use other character codes must pay attention to these codes. Many Asian languages require other character codes, that are necessary to support hundreds or even thousands of characters (for example, Japanese). Localization is the process of taking software written for one system and porting it to another system that uses another language and possibly another character code.
Also on the logical (as opposed to visual) side of the discussion is the ability to associate attributes or tags with individual characters. Essentially, tags are names you can associate with characters for whatever purpose you like. For example, the FrameMaker publishing system allows the definition of character tags. Each tag defines a particular font family, size, weight, and other properties, which can be applied to any character. These tagged characters may then be manipulated as a group if necessary.
Named attributes or tags such as these provide a convenient mechanism for manipulating the visual appearance of characters throughout a document. You can also use them for semantic purposes. For example, you could associate the name "placeHolder" with particular characters you wish to use temporarily. You can search for the tag "placeHolder" to locate the particular text. You can even print a report listing all occurrences of the "placeHolder" tag and where' they occur in the document, creating' an automated list of work to be done.
For the visual side of characters, many font manipulation tools are available that could be considered part of font definition software. If you want to change the appearance of the character T, for example, you would use a font definition tool.
There are many more issues concerning the visual aspects of characters and fonts. Please see Section 3. 2. 1 Fonts and Typography later' in this chapter for ' a discussion of these issues.
The act of writing takes place at the word level of the document component scale. Most of the discussion about writing is in Section 3. 3 Communications Views, later in this chapter. Spelling checkers and grammatical aid systems are some of the electronic publishing tools that help with writing. The growing popularity of computer-assisted ' writing aids attests to their growing sophistication.
Another manipulation of words is automatic hyphenation. This is a manipulation of the logical or semantically meaningful aspects of words. Often, publishing systems allow the user to modify some variables to control the precise way automatic hyphenation is performed. For example, these could be variables to control the minimum and maximum number of characters before and after the hyphen. In addition, electronic publishing systems that support several languages must also have hyphenation dictionaries appropriate for each language. Hyphenation algorithms differ among publishing systems. (2) The same document in two systems may not appear exactly the same, even if the fonts and page margins are identical, because the hyphens will break the words at different places. Hyphenation is part of the process of formatting and can hinder efforts to interchange documents with perfect fidelity. It's amazing how complicated these little details can be!
3 . 1 . 3 Paragraphs with Tags and Styles
Moving up the complexity scale, we now come to the paragraph. One of the most powerful document processing tools is the ability to attach attributes, tags, or styles to paragraphs. I use the terms tags to refer to the logical aspect of paragraphs and styles to refer to the visual aspect of paragraphs.When writing, we generally treat the content and appearance of paragraphs uniformly. Individual paragraphs have the same margins and typefaces (they should also contain a coherent idea). Many software products treat the paragraph as an entity that can be manipulated as a unit.
When manipulating a paragraph, it is important to distinguish the logical aspects from the visual. The logical use of a paragraph tag might be to identify all chapter headings. The publishing system may support the intent of a document structure and not allow the creation of a chapter heading in the middle of a table. Identification of the logical structure of a document is one of the major features of formal document standards and is discussed in detail in the Document Standards chapter. (See Section 5. 3 SGML in Chapter 5 Document Standards for a discussion of document structure.)
Another logical use of tag names is the actual name itself. The name "Body text" conveys the meaning that the body copy in a document will be associated with the tag "Body text." It is important to select meaningful tag names. Cryptic, "cutesy" names obscure the intent of the tag or style. Spend the painful time creating good names that will be meaningful to others in your organization.
The development and use of a consistent set of paragraph tags can be of tremendous
value. This task should be done at the start of any significant project. Visual consistency can be achieved by using the same tags in the same places. Just as important, changes can be applied to specific tags or styles in one place and then applied to the entire document. The concept of a style sheet is intended specifically to allow changes in one place to migrate to the rest of the document. Changes made to a style sheet can also be applied to other documents, helping to automate and keep consistent all documents of a particular project or organization. Coherent tag names allow the logical aspects of the document to guide the visual appearance. Style sheets are just starting to appear for Web page authoring. (See Section 1. 5 Authoring in Chapter 1 World Wide Web for more info on Web style sheets.)
Contrary to the other document components, the page is purely visual and has no meaningful logical aspects. Indeed, we could also have this discussion for a "screen" of information. Pages and screens are convenient well understood units of information content. Pages are the physical spaces in which textual content appears. Page sizes can be altered and documents can be reprinted in different sizes and formats for on-line browsing and so on, with no effect on the content. Pages do not have any logical aspects other than their very existence. They represent a canvas upon which the content is painted.
From a visual point of view, the page provides a place for a number of items. Headers, footers, body text, and page numbers are some of these items. They are placed on a page, in a consistent position throughout the document. The positioning of these items is primarily a matter of design; but there are also computational factors. Some of the page-specific items, such as the page numbers, running headers, and running footers, can be computed or extracted from the text. The content of these items can be changed, based on the specifics of the page.
Although a page has a specific size that is rarely changed, paying attention to the size is sometimes crucial. Many systems support specific page sizes implicitly. This implicit assumption can cause a nasty problem if you need to interchange documents with an organization that uses a different standard page size. This might happen when a U.S. organization exchanges documents with an organization based in Europe as U.S. standard page sizes (8.5 x 11 inches) are different than the ISO A4 (8.25 x 11.75 inches) size used in Europe. The document will probably not print correctly unless you adjust for page size. On-line documents formatted for a VGA PC screen or a large screen workstation encounter visual problems more and more as Web browsers dynamically reformat content.
The layout and overall design of components such as text, graphics, and illustration are best manipulated in a page layout program. The quintessential example of this type of software is Adobe PageMaker (formerly Aldus). One of the keys to PageMaker's success is that this software speaks the language of designers. It presents the user with a simulation of a pasteboard (an underlying grid for creating the proportions and overall structure of the document), a commonly used graphic design tool.
One distinction that must be applied only to the page is handednesswhether the content of a page is to appear on the right- or left-hand side of the printed document. Margins, columns, headers, footers, and page number positions are sometimes shifted on the page, depending on whether they are to appear on a left or right-hand page. The more powerful electronic publishing systems provide tools to control handedness of particular parts. One example is the ability to force the start of each document (for example, chapters) on a right hand page.
I -liT 2
.14 ilMji Lfr hjiir
Text flow is yet another term that really crosses the boundary from a page to a document. Newspaper articles leave pointers to the connecting text, such as ”see Bozos column 5, page 22". These pointers tell the reader where the text is continued. The visual shape of this flow is either rectangular or follows the shape of graphic elements. Page layout or page makeup programs such as QuarkXPress and Aldus Pagemaker provide tools that allow text flows to travel automatically around graphic elements.
Frames are another frequently encountered term with a strong relationship to the page. In a sense, a frame is a subdivision of a page. It is an invisible boundary in which content appears, just like a page. Frames, however, are not physical things; they are areas that can be manipulated while using the publishing system. Text can flow automatically from one particular frame to another. Corel Ventura (formerly called Ventura Publisher) and FrameMaker (now from Adobe) both use this concept. The Netscape 2.0 Web browser has an on-line frames capability that allows for more flexible Web page layouts and interactions.
Last, but not least, Interleaf generalizes many of the aspects of a page in a feature known as a microdocument. Microdocuments are "little" documents, inserts embedded in the pages of other documents, that can independently retain stylistic characteristics. All the styles associated with a particular document can be retained intact with microdocuments,
but the microdocument can be no larger than a page.
The document in its entirety is the next stop in our analysis of document components. From a visual point of view, the document is a physical object with a particular design. From a logical point of view, the document is composed of a certain structure. The visual design and construction of documents(3) is a topic beyond the scope of this book. However, electronic publishing systems can play an essential role in the manipulation of the logical aspects of a document.
The logical structure of a document is an important characteristic of the document. We can use that structure as a framework to evaluate document processing tools. Some questions to ask in determining the suitability of a particular publishing system are:
Can the system automatically generate a table of contents?
Can the system generate lists of various elements such as tables and figures?
What kind of graphics can be integrated easily with the text?
How robust are the indexing capabilities, if any?
Is there good bibliographic and cross-referencing support?
Technical publications, in particular, need robust document-oriented tools. The more automated the tools, the better. It is essential that the publishing system provide support for automatic section numbering, running headers and footers, styles or tags, and change control. In addition, support for global changeschanges to many files that are part of a larger documentis a major time saver.
Several publishing systems present the user with the idea of a book(4) as an organizational tool. Books are made up of collections of files. If a change is made to the book, then the change is actually made to all the files that make up the book. If your publishing projects routinely deal with hundreds of files, this type of support will be an important requirement for any publishing system. An on-line equivalent of a book is a Web site with its web of interconnected pages.
As the sheer size of the document grows, we start to see a significant distinction between WYSIWYG (what you see is what you get) and batch language oriented systems. Often you don't want to see extensive, repetitive, massive changes. If you are forced into too many hand manipulations, the publishing system may be unwieldy for the particular publishing application. The higher-end publishing systems try to balance WYSIWYG capabilities with the often awkward and complicated commands of a batchoriented system. (See Section 4. 1 Types of Document Processors in Chapter 4 Form and Function of Document Processors for a more through discussion of WYSIWYG versus batch document processing.)
When we discuss the multivolume or encyclopedic scale of documents, our focus shifts from document manipulation to the concept of a data repository. Manipulation of large quantities of related material is one of the strengths of batchoriented document processing systems. Offline automated processing is a virtual requirement for this scale of manipulation.
This level, in our document component scale, also represents the highest point at which a collection of documents is part of a coherent whole. Representative examples of documents at this level are the many manuals of an operating system, the volumes of an encyclopedia, and the maintenance manuals for a jet engine. Interleaf is a good example of a system with capabilities at this level. It uses the concept of a cabinet that contains collections of other documents.
Only when a publishing system supports the manipulation of multiple volumes as a unit is the multivolume category qualitatively different from the previous category. The large volume of data and high capacities required for such manipulations are supported only by the highend publishing systems.
Again, Interleaf is an example of a publishing system that supports different types of style sheets; one that can be applied to individual documents and a master style sheet that is used to modify other style sheets called the master style sheet. Master style sheets are an important feature when massive and consistent changes are required. The language-oriented document processing systems such as troff and TeX (See Section 4. 1. 2 Language Characteristics in Chapter 4 Form and Function of Document Processors) are also effective at working with massive amounts of material. Automated scripts can be created and documents processed without human intervention. In general, however, skilled technical users must create these scripts as they require a different type of staff than the turnkey, but more expensive, systems.
An enterprise (no, I'm not talking about Star Trek), the final level in our document component scale, is discussed here because it relates to the topic of text retrieval. When maintaining or creating a library of documents or other large archival collections of documents, the technical issues are primarily ones of access. Finding information quickly and easily is the primary issue.
The most important area in which to address these issues is that of classification. Classification and searching systems are integral parts of library science. A good classification system enables users to locate the information they desire and aids in the management of the documents. After all, if you can't find the information you need, when you need it, you may as well not have it at all. One area where document processing and searching systems intersect is that of fulltext searching.
Fulltext searching is the ability to search for any word in an entire collection of documents. The searching is usually accomplished through the use of a document browser. The emphasis in fulltext searching is on speed at the sacrifice of space. It is not unusual for the indexes used to locate the text to take up as much space as the text itself. The combination of a good document browser and fulltext searching really makes the entire field of electronic books a useful practical commodity, rather than just an interesting toy.
Fulltext retrieval engines are widely used in the creation of systems that manage large quantities of text. These retrieval engines are becoming quite prevalent in the CD-ROM and Web site industry(5) and are a key technology to enable access to a library full of
Text retrieval is a complex field that is growing in importance as the world gets interconnected ever more tightly with networks.(6) Internet Starting Points used with Web browsers all have one form or another of a text retrieval engine. The possibility of indexing the Web challenges the computer science of text retrieval.
information. The large capacity of CD-ROMs is an ideal complement to the large space requirements of fulltext retrieval systems.
The increased capacity of lowcost storage devices like CD-ROMs is also a major factor in text retrieval, because entire databases can be put on-line right at your very own PC. (For more information on text retrieval, see Section 8. 5. 2 Text Retrieval in Chapter 8 Document Management.)
The enterprise document level is the largest in scope of the seven levels. A collection of documents and tools for the management of an entire organizations documents is covered by this level. Some vendors even offer tools that help manage an enterprise's information resources.
Open Text, a company with a long history of text retrieval software, now offers a Web server that can index an internal Web, an "intranet". In fact, internal enterprise Webs are an increasingly popular use of the Web for project management, status reports, meeting scheduling, meeting minutes, and so on.
In "The Web and its Many Uses" an article in Advanced Systems Magazine, May 95 by Chuck Musciano (chuck.musciano@advanced.com), he argues for the use of the Web for a variety of organization wide functions: e-mail archives (via mail2html), meeting minutes, and reports. Concerning he Web as a front end to SCCS, he says "From simple things like on-line mail archives and team document collections to fancy tools that track customer queries and project status, the Web has a place at every level of your development organization."
In addition to increasing collaborations within an organization, internal Webs can be used to test out new technologies. As reported in Web Week,(7) AT&T is using its internal Web to shake out digital payment technologies. Primarily geared toward internal
purchasing, the trial is also functioning as a testbed for the various types of digital payment technologies.
Another product, AnchorPage, will index your internal Web and allow visitors to search the content. As the scope of your Web grows, finding information becomes even more critical than simply adding information to the Web. Interleaf has a high end product from a long time electronic publishing software vendor. Their Web publishing product, Cyberleaf, addresses, in a comprehensive manner, not only the composition and Web page creation issues, but also organizational workflow issues.
Lotus, InterNotes Web publisher converts Notes databases into Web publishable documents. Notes is probably the preeminent "groupware" product. It enables groups of people to collaborate, by placing and updating information in a Notes server. (For more information of groupware, see Section 8. 3 Groupware in Chapter 8 Document Management.) The contents of the Notes server are a valuable resource for an organization. The InterNotes Web publisher enables users of Notes to publish their Notes databases on the Web, widening the availability and utility of the database.
That about wraps up our analysis of document components. The Web is forming new information structures creating a collection of global networked information. The rapidly solidifying collection of information, accessible via networks, may quite realistically form a global library. The technical barriers to such a fantasy are quickly disappearing. Only the legal concerns (which are not minor) of intellectual property rights, copyrights, and patent law remain as murky unknowns. (For a more through discussion of the possibilities of networks, see Section 7 4 Electronic Distribution in Chapter 7 Applying Standards )
[SECTION 3.2] [TABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall, Inc. i
A Simon & Schuster Company Upper Saddle River, New Jersey 07458
- Legal Statement -
3 . 2 The Design Point of View
Design is another point of view that must be considered as we examine ways of approaching the document-creation process. The way a document is visually presentedhow it grabs the audience visuallyis a critical factor in the overall perception of a document. After all, the end product is an object to be viewed. The aesthetic components that make up the pages, fonts, layout, and color all contribute to the overall goal of producing a document that communicates ideas clearly. (A thorough treatment of document and Web design(8) is beyond the scope of this book, but for a list of good books see section Publications in the appendix Resources.) The remainder of this section will introduce some of the basics of document design and other topics that have strong relationships to document processing.(9)
3 . 2 . 1 Fonts and Typography
"Typography is to writing what a soundtrack is to a motion picture.” -Jonathan Hoefler
Open any computer magazine about desktop publishing and you will see many ads for fonts and font-manipulation software. It may seem that the world has gone a little font crazy. Fonts, specifically, and typography, in general, are extremely important.
In some sense, typography is something that is so obvious, so visible, and so all-encompassing that most people simply don't notice it. However, it is precisely because typography is so pervasive that it is so important.(10) Fonts are not simply the shape of letters for creating words; they are letterforms with carefully designed shapes and subtle differences that relate to each other and that combine to make a pure visual statement.
Some software tools pay more attention than others to the role of fonts and typography. Depending on your specific needs, these tools may or may not be important. However an awareness of the crucial factors can only help when judging the capabilities of a particular tool. In general, page makeup and page layout programs have much more flexible typographic features than their batchoriented counterparts. The WYSIWYG nature of page makeup systems is more suitable to adhoc design and experimentation.
Strokes
If you are faced with selecting a font, it is important to consider the number of variations available in a font family. Some font families have more than a dozen variations. This within-family flexibility can only make the designer's job easier. Using several variations within a single font family is almost always aesthetically safer than mixing arbitrary fonts.
Many tools are available for font manipulation. These tools allow precise adjustments of kerning tables (the spacing between letters), the creation of new letterforms, the extraction of outlines, distortions, and so on. One important reason that such a variety of detailed tools exists is that font design has such an important impact on the document as a whole. Letterforms are a key ingredient in a document, and designers use them as the raw material to be
by their designing tools.
Of course, it's important not to get too carried away with these tools.
djlNv .i.Wfrowf rtat
Individual characters may also be used as graphic components. The line between font manipulation and graphic illustration can blur quite easily.
The many software tools available for font manipulation allow such a wide variety of choices that the traditional letterform is no longer sacred. Characters used as illustrative elements bring us back to the age of illustrated manuscripts filled with carefully crafted
characters. There is of course the added danger of "font junk," the use and abuse of font manipulation tools by the amateur.
Fonts are also one of the more problematic aspects of document interchange. A font used in one document may or may not exist as a "system" resource on another computer system. Sometimes, if a document depends on the system to provide the font, and it's not there, an available font is substituted and the look of the document changes. Adobe's Multiple Master font technology addresses some of these problems and is a key component of their Acrobat line of products. Multiple Master fonts are able to parameterize more of the font that other font technologies. (For more information on Acrobat, see Section 7 . 4. 2 Electronic Page Delivery in Chapter 7 Applying Standards.)
Another somewhat obtuse but powerful character manipulation system is the METAFONT language.(11) METAFONT is a precise mathematical description of fonts; in many ways, it models the way ink is placed on paper by a pen. METAFONT is the creation of Donald Knuththe same man who brought you TeX see( Section 4. 1 Types of Document Processors ). METAFONT is a language for describing characters in excruciatingly precise terms. After creating or modifying a description, the system chews away on the "code" and spits out a new font. These fonts can then be used by TeX, turning this interesting academic exercise into a practical and useful tool.
3 . 2 . 2 Layout and Composition
The placement of the various components of a document on a page is the layout. Document layout and composition are critical pieces of the design puzzle. Unfortunately, the only help electronic publishing tools have to offer is assistance through the use of templates. Some tools, like Microsoft's Wizards in MS Word, lend you a helping hand to fill in templates. Tools that aid in the overall layout and structural composition of documents exist only in research laboratories. Automated aids for global design features such as overall balance, proper use of white space, and so on, do not exist as product features.
Typical document processing systems have style sheets or master pages, that define a particular visual layout. The visual layout of document elements on the style sheets can be applied to the entire document. The number of master pages and the flexibility in working with them are important capabilities of a document processing system. Often, global changes to a document are accomplished using these types of pages or styles. Careful use of master pages and style sheets is a significant help in the management of overall document consistency. (For a more through discussion of document management issues, please see Chapter 8 Document Management.)
In the future, it may be possible to have design "helpers" in much the same way that grammar checkers now help. Such suggestions are not pure fantasy. We are already starting to see the application of image-recognition systems in the pen-based portable computer field. Users can create rough sketches, and the system cleans up the drawing on PDAs (Portable Digital Assistant) like Apple's Newton. Image recognition is being taken a step farther with the concepts of shape grammars.(12) In the architecture and computer graphics domains, shape grammars have been used to create simulated buildings in the style of Frank Lloyd Wright(13) and paintings by Kandinsky.(14) The concept is to create a grammar, a language, from a set of shapes as well as the allowable operations upon those shapes. Many interesting grammars have been created to describe the styles of architects and artists.
[SECTION 3.3] ITABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall, Inc.
A Simon & Schuster Company Upper Saddle River, New Jersey 07458
"When the writer becomes the center of his attention, he becomes a nudnik. And a nudnik who believes he’s profound is even worse than just a plain nudnik." -Isaac Bashevis Singer
First and foremost, a document is a tool to communicate information. The type of information will affect the type of communications. Different information types are entertainment, reference, scanning, mandatory versus optional, sales, friendly, and formal. Each information type has customary visual conventions. Used poorly or too often, they will cause your document to look like just more pieces of paper. Used judiciously and with imagination, they can be a valuable aid.
But ultimately, the content expressed in the document is what really matters. If the reader understands the content, your communication was successful.
Often the main trick to successful communication is getting the reader to pay attention. Look at some of your junk mail; innumerable attentiongetting devices will come into view. Colored stamps, fake telegrams, pop-ups, personalized names, metallized envelopes, and more are all attention grabbers.
In the domain of electronic documents, clip art collections of all sorts can help you draw attention to your documents. Clip art collections with all sorts of specialty images (see_ section Clip Art in the appendix Resources) from military symbols to biological parts to cartoons, can convey a message to the reader. Clip art and unimaginative attentiongetting devices can cut both ways, however.
Customizing the content of an article for a particular audience is a good way of improving
communication. Of course, doing this is extremely difficult for largevolume publications, such as newspapers and magazines. One interesting technique used by the Washington Post (and others) is called zoning. The Post has a column called Dr. Gridlock that describes the trials and tribulations of travel in the Washington, D.C. area. The content of this column is modified for specific areas by the use of readers' addresses via delivery zones.
SCIENCE, POLITICS, and FOOD PYRAMID GRAPHICS
Although design doesn't mean everything, it can have important and even political impact. For instance, take the case of the food pyramid.
In April 1991, the U.S. Department of Agriculture (USDA) was going to publish a replacement of the basic four food groups wheel, a staple of classrooms since the 1950s. The idea was to increase the importance of grains, fruits, and vegetables and to reduce the importance of meat and dairy products, following more recently discovered good nutritional practices. As you might imagine, the beef and dairy lobbyists were not too happy about this turn of events. After a great deal of criticism, publication of the pyramid was halted. According to one nutritionist angered by the USDA reversal, "It was the visual that made the impact. That's what upset people; it clearly showed you should not have as much meats and dairy products as you should grains, fruits, and vegetableswhich is the truth." (15)
One year later (and $855,000 more), the USDA unveiled a refined pyramid and had more data supporting its case. In the end, good science won out, and the lobbyists had to live with the design of the food pyramid.(16) Now the Food Pyramid is a classroom staple and also appears on the packaging of many products in your supermarket.
3 . 3 . 1 Aid for Grammarless Writers
"A man’s grammar, like Caesar’s wife, must not only be pure, but above suspicion of impurity.” -Edgar Allan Poe
As we examine ways in which technology can help in the communication of ideas, publishing systems can provide a number of tools to aid grammar. At times the technology of word processing and desktop publishing systems is more fun than writing. Integrated
graphics with text, WYSIWYG displays, and font manipulations can divert the writer from the communications task at hand. In a Washington Post article titled "Does Technology Contribute to Bad Writing? Perhaps It Might Probably CouldOr NOT," Michael Schrage, a columnist for the Los Angeles Times, commented:
Indeed, some people argue that word processing technology makes the physical task of writing so much easier that some people toss self-discipline to the electrons and hedonistically indulge themselves by larding their prose with everything but the kitchen sink. Conversely, the "perfectionists" turn into digital Flauberts, writhing in agony over which comma should go where and if that semicolon is really the best way to go.
Some products, used judiciously, aid the process of writing correctly and with good grammar, but nothing can stop the rambling author from rambling with run-ons and going on and on and on.
Products such as RightWriter (Cue Software), Grammatik (Reference Software) and Correct Grammar (Lifetree Software) rate documents for readability. Grammar checker systems can generate reports about average sentence and paragraph length, the use of passive voice, the use of jargon, and other writing aspects. They also provide suggested changes. These packages use readability scores to rate the document as appropriate for a particular reading grade level.
A few readability indexes are widely recognized. Chief among these are the Flesch-Kincaid Score and the Fog Index. According to the RightWriter (a grammar checker) manual: (17)
The Flesch-Kincaid formula is the United States Government Department of Defense standard (DOD MIL-M-38784B). The government requires its use by contractors producing manuals for the armed services. The Readability Index is equivalent to the Overall Reading Grade Level (OGL) for the document.
Grade Level = (.39 x ASL) + (11.8 x ASW) - 15.59.
ASL = average sentence length (# of words /# of sentences).
ASW = average # of syllables/word (# of syllables /# of words).
A good range is 6-10.
AT&T sells a writing tool called WWB, the Writer's Workbench software, that runs under the UNIX operating system. It is an interesting collection of utilities that help analyze writing style and suggests changes to fix grammatical problems. It can look for problems with punctuation, sentence length, readability, split infinitives, and overall organization. WWB even has a utility to compare your language style with that of another document, facilitating consistency over large numbers of documents.
3 . 3 . 2 Random Writing Tools
Aside from the various grammatical aids previously mentioned, spelling checkers are certainly the most frequently used writing tool. Spell checkers vary from ones that simply list the words not found in a dictionary to ones that make suggested corrections. . The better spell checkers can work with several dictionaries and may be able to use a general dictionary, a site-wide (organization) dictionary, one for a user, and one for the particular document.
Most of the widely used word processing packages provide or work with a built-in thesaurus. These are always useful when searching for that hardtothinkofword, utterance, expression, maxim, term, slogan, verbiage, declaration, idiom, phrase, remark, statement, comment, and so on.
hJXh--kj|M yphMUi WIN
rwHu1 W UJ H I
4tw«-
11u. ■■■hn ***.
Ff
Em ■ ™^n
p ■■i" ■ r-,-r l ■ *
One innovative writing tool introduced back in 1987 is the Microsoft Bookshelf. It was one of the first serious mass market CD-ROMs and was aimed at writers. The storage capacity of the CD-ROM enabled Bookshelf to contain 11 reference books and information data sets. Among these were The American Heritage Dictionary, Roget's II: Electronic Thesaurus, Bartlett's Familiar Quotations, The Chicago Manual of Style, and the U.S. ZIP Code Directory. The combination of these reference materials in the context of a PC and a word processor is a powerful tool.
Riia |>h
Vn|irtik/ipil!^rl ■ ■ ih q |p pniatrvni uii«r'B*Pf p^^Miiha ■ i^fhalhJ
'LkU^rl^
PM IMNUMflHrilh
iL#TrhhXL — Hill I Lb.1 tf
h II V .UUl P
Budding poets can also be computerized. The "Rhymer" from WordPerfect Corporation is a rhyming dictionary available for use with WordPerfect on PCs. You can search for words by a number of phonetic characteristics. Act like a bloodhound and search for a sound; it will simply astound, not confound. Just imagine the possibilities of rhyming for searched quotes with words found in the thesaurus! Onward writersnow you have as many tools to abuse as graphic designers do!
[SECTION 3.4] [TABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall, Inc.
A Simon & Schuster Company Upper Saddle River, New Jersey 07458
Documents are complex objects. Let's now examine the document as an object composed of a variety of pieces that must be "engineered" together.
Often, the only time all pieces of a project come together is when the final report is due. All the information gathered from a variety of sources must be assembled into a coherent, deliverable product. Most likely, many people contribute to the final report. Their individual idiosyncratic uses of publishing tools must be integrated into a consistent product. Data created by spreadsheets or images from drawing tools are also often included in completed documents. The assembly of all these components brings us to the topic of the compound document.
The compound document, as its name suggests, is a document composed of many parts. These parts may originate from vastly different systems and exist in many different formats. From a technical standpoint, the integration of these pieces into a coherent whole is a formidable task. Each part must be integrated seamlessly into what appears to be a single consistent document. Even more difficult is the often necessary requirement to go back to the original system that created the data, such as a spreadsheet, to edit the data.
Electronically created compound documents resemble information quilts patched together from a variety of information sources. You may use information created for one purpose in one particular system in several systems. You may also use the information for a different purpose than was intended. Documents created with such information can quickly become impossible to maintain and update.
The original data sources become an integral part of the creation process, and great care must be exercised to maintain those data sources for future versions of the document. Text, graphics, and scanned photos may be assembled for one purpose and later reassembled for another (i.e., a Web site). You may reuse document content. If proper care is taken of all the various data sources, you can reuse the information. Reusing the content allows an organization to profit from the publication of the content again and again.
Before we get into some more detail, let's take a look at the forest before starting a hike through the trees. Many technologies created in the last several years impact compound documents. The concept, however, is simple and elegant. The user should be allowed to read, or write a document. Inside the document are all sorts of media types that the user may want to mess around with as part of the editing process.
The world starts getting complicated when vendors, of necessity, address issues concerning the storage and interoperability of these complex compound documents. For example, if a document contains a variety of spreadsheets embedded in the document, it is comforting for the user to know that the spreadsheet will be updatable. The document itself becomes the focus of a user's attention and becomes the principal vehicle for systemwide data integration. One trend has been to represent the various media types as "objects." Then you can use and reuse the objects and the software which operates on them. A wide variety of object storage mechanisms have appeared with no clear winner on the horizon. Expect confusion to be the norm for several more years, at least.
Two major integration strategies are Microsoft's OLE 2.0 and Apple's OpenDoc.
OpenDoc is a collaboration between Apple and IBM and was designed for multi-platform operations. A somewhat dated, but still valuable comparison of OpenDoc to OLE is available from IBM at: http://www.austin.ibm.com/pspinfo/odoc-ole.html.
From the OpenDoc FAQ:
What is OpenDoc? OpenDoc is a multi-platform, component software architecture that enables developers to evolve current applications into component software or to create new component software applications. OpenDoc software will run on Apple Macintosh personal computers, as well as Windows, Windows NT, OS/2, and AIX systems. With software enabled by OpenDoc, users will be able to mix and match software to fit their needs, combining text, graphics, video, spreadsheets, and many other types of data into a single document.
Individual elements, called components, may be edited by "component editors." A component editor is a "independent program that manipulates and displays a particular kind of content."
The object representation for OpenDoc is called the System Object Model (SOM) and is from IBM. Again from the OpenDoc FAQ it is a "platform-independent framework for allowing component software to exchange data and instructions. It is a highly efficient dynamic linking mechanism for objects, which supports multiple languages and provides a gateway to distributed object servers."
Another element of the OpenDoc Architecture is Bento, a portable compound document and multimedia storage library and format. Finally there is also "Component Glue," an acknowledgment that Microsoft exists. Component Glue "enables interoperability with Microsoft Corporation's Object Linking and Embedding (OLE) technology for interapplication communication. OpenDoc's significantly simpler API allows developers to program Microsoft OLE much easier via OpenDoc." (See the OpenDoc Web site for more gory details at: http://www.opendoc.apple.com.)
OLE 2.0 from Microsoft is based on yet another object storage model called the Common Object Model (COM). It is more appropriate to compare COM to CORBA (Common Object Request Broker Architecture) rather than to OpenDoc. COM and CORBA are also not attacking the exact same problems, so a comparison here is also flawed. In an excellent article, "OLE and COM vs. CORBA” by Michael Foody in the April 1996 issue of UNIX Review, Foody points out that, "In general terms, COM...is used in desktop applications to provide a binary standard for software component interoperability and ORBs are used as the infrastructure to construct larger-scale distributed systems. Of course, Microsoft is working on a distributed version of COM, designed for use in enterprise-class distributed systems, while IBM is busy working with Apple to use SOM as the basis for a desktop component model called OpenDoc."
Both IBM and DEC have had other software projects that address the challenge of compound documents. IBM's MO:DCA (Mixed Object Document Content Architecture) is a combination compound document and object architecture. DEC's CDA (Compound Document Architecture) is a system resembling the philosophical approach of ODA. (For more information on the Office Document Architecture standard, (See Section 5. 5 ODA in Chapter 5 Document Standards.)
As we've just seen, the concepts of compound documents have been around for quite some time. The coming of the Web, however, makes the creation and use of compound documents a common place occurrence. With all the advantages the Web has brought, it has also magnified some of the problems of conformance, performance, and standardization. Vendors are trying to differentiate themselves by introducing hot new technologies. Content creators are placed in a bind because the use of these new technologies, although compelling, limits the audience and distribution possibilities. There are no simple answers; just be aware of what's going on so you can make educated choices.
The various architectural approaches discussed in the previous section permit the creation of new types of document processing. One new type is the active document. A number of publishing systems already tout this capability, but may call it different things. For example, a pie chart of data from a spreadsheet, included in a document, may update itself when the spreadsheet changes. In another case, a paragraph just rewritten may initiate an electronic mail message to a manager, informing the manager of the change and requesting approval. The document is no longer a passive object; it is doing things. The notion of a document with active components is another step in the direction of a totally integrated information environment.
Several technologies are available for inter-process and inter-application communication. Publishing systems approach the problem of application communications in several ways. Ultimately, the publishing system depends on the services provided by the operating system. Most operating systems provide some mechanism for interapplication communications, and these mechanisms are exploited by some of the publishing systems. For example, on MS-DOS platforms running MS Windows, a facility called OLE (Object Linking and Embedding) is used by MS Word for Windows to include "live" EXCEL spreadsheets. The Macintosh's System 7 operating system has a "Publish and Subscribe" facility for inter-application communication. Interleaf and FrameMaker on UNIX platforms use RPC (Remote Procedure Calls) to allow an AutoCad drawing in a document to be linked to the AutoCad application.
Interleaf’s active document technology is one of the more ambitious implementations of the active document approach. Document sections can behave in certain ways and take various actions. For example, a document can be directed to send e-mail to various managers for approval before permission is granted for the public to view the document.
In fact, one of Lotus Notes, strengths is to allow the organization of this type of work flow procedures with various types of documents. (See Section 8. 3 Groupware in Chapter 8 Document Management for more information on work flow issues.)
This feature could prove invaluable to organizations that require complex configuration management of documents, because documents are just one portion of an engineering effort. For example, the production of an airplane must correspond accurately to the various designs and tests of the airplane. The ability to embed "intelligence" into documents is an interesting approach to the configuration management problem. (For more discussion on this topic, See Section 8. 2 Configuration Management in Chapter 8 Document Management.)
Here again, the Web provides ample examples of the ability to take older concepts and apply them to newer implementations. Active document technology is perhaps best exemplified in the Web with the emergence of Java. The ability to transmit little programs called applets has taken the Web by storm. The enthusiasm with which the Net has embraced Java is both a credit to Sun's technology and their ability to market it in a Net-friendly manner. Java applets allow authors to wake up their documents. No longer passive reading material, a Java-cized document can shout, sing, and interact with the reader. Active documents have hit the mainstream.
[SECTION 3.5] [TABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall, Inc.
A Simon & Schuster Company Upper Saddle River, New Jersey 07458