3 . 5 The Database View

Let's move now to an examination of the relationship of documents to databases. Documents can relate directly to databases in two main ways. First, as the report or simple printout of a database. This is known as database publishing. Second, and more interesting, is the use of a database to hold the document content. The various components that comprise a document can be placed into a database. The database can be queried by the publishing system and out pops the printed pages (if only it were that simple). These types of systems are possible today; however, there are no hard and fast rules for accomplishing such implementations. Each organization's needs and requirements must be carefully analyzed, and no one solution will fits everyone's needs.

Reusing a document's components is becoming increasingly possible. Reuse is possible only if you can identify and reassemble pieces of content. Mechanisms to break apart the original document into meaningful component parts can be developed using standards and well-defined recommended practices. Reuse is also another potential benefit for the use of a document structuring standard like SGML.

Sometimes the seemingly simple task of finding the relevant material may, in fact, be the most difficult aspect of reusing content material. Document repositories must be created with appropriate key words or embedded tagging mechanisms that enable meaningful retrieval. Electronic imaging systems, which scan reams of documents and store the images on optical disks as a replacement for microfilm, are a growing industry. Without sufficient tagging, these systems are a small step forward from microfilm technology. An image of a page without the means of asking for information about what is on the page is no better than a picture. About the only savings is in physical storage space (which may, in fact, be significant for an organization).

3 . 5 . 1 Database Publishing

An extension of reportgeneration capabilities, which has been around for many years, database publishing adds a level of integration between the database and the publishing system. A report is one visual representation of a database. Slick publications produced by pouring data into visual templates might be another representation of the same database.

qiMh M

■■ W■■■■ ™ ■"

I'W-LJIIWL ^iM'MkVtV. lkrvapTBMUMra ilwvt ■ li^i !■ r'l ■ ^p■ 4^ilif,ir

iifim :i awivi'w^

r^-ra kr_^Hkb Cr^hh^j h

b-ad ui■ u|.LHnr^|i uina^rl M^fH AH-if V 1-f j^VHP.

hi ll1_|UlHI.Iip«ITdB1Up

+ i ■fcpa JIHMJMbiIII.

Database publishing tools allow you to choose particular fields in a database for printing. Particular styles can be applied to these fields and selectively printed. Such tools are invaluable for catalogs with thousands or hundreds of thousands of entries, such as a yellowpages directory and parts catalogs, which must be updated regularly. Information from inside the database is extracted and combined with the publishing system to produce goodlooking documents, not simply printouts.

The most common form of database publishing involves merge facilities. A merge facility combines regularly structured information with a template document. For example, a list of names and addresses, one per line with tab separators, might be combined with a form letter that contains special codes that indicate, to the document processor, when to insert data from the data file.

4^. -1” I kfr"

A tighter link between the database (information source) and the document (information sink) aids the communication process in several ways. Information in a database will eventually need to be explained, summarized, and otherwise communicated to someone. With a link the information can reach the document faster and with fewer potential errors. Translations and transcriptions of database information to documents are errorprone tasks; the more direct this process can be, the better.

If you take the concepts of database publishing one step farther, you arrive at the concept of a document that functions as a front end to a database. The information in a document that came from a database can serve as the interface to a database. Database queries via the document and automatic updates bring the database/document connection full circle. Live link and active documents provide the technological foundations for a tight linkage.

3 . 5 . 2 Customized Publishing

If a collection of informationthe content of a documentis kept in the proper type of database, publishers can reuse the content and create customized texts. Hardware and software advances are both contributing to a new publishing technology, that enables documents to be custom made for particular audiences.

In 1990, a partnership between McGraw-Hill and the University of Southern California was described as follows:

Textbook publishers are offering new computer and printing systems that allow professors to custom-design textbooks by handpicking course materials from electronic databases stocked with traditional textbooks, magazine articles, and other published information.

These customized books can be printed in limited quantities by the campus bookstore and distributed to students, sometimes within hoursnot weeks or monthsafter ordering.(18)

One enterprising Washington, D.C., based company is taking another direction in custom printing. You might call it "just-in-time" printing. It produces an hourly newspaper, called "The Latest News," for people who travel the Washington to New York air shuttle.(19) Information from wire services is fed into document processing systems and formatted right away. This approach blurs the line between printed media and radio.

The ultimate in customized publishing is represented by some of the research at MIT's Media Lab. One interesting project unites information from wire services and television news. Using a computer screen with a touch screen interface, the reader can interact with this "newspaper." Fingering topics brings articles into view, touching a color picture can bring it to life as video. This work and other projects at the Media Lab are pointing the way to personalized interactive information sources beyond the newspaper...but I'd still like to read it in bed.(20)

Finally, to no one's surprise, the Web, the great technology integrator, provides ample example of interactions with and publications of databases. The most common use of database publishing on the Web are the Internet Searching starting points, (see section 1.

9 Internet Starting Points) that allow users to query the database. The result is an instant database publication of the search results.

A great example of customized publications is the ability to customize your own starting page when using Microsoft's Web browser, the Internet Explorer. The Internet Explorer lets you pick a set of favorite linksnational or world news, weather, comics and television listingsand have them displayed on your starting page. Of course, you must have the Microsoft Network set as the "home" page for this to work, but it's a compelling feature.

[SECTION 3.6] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

A Simon & Schuster Company Upper Saddle River, New Jersey 07458

Legal Statement

3 . 6 Specialized Views

Fullfeatured document processing systems often include specialized areas that have their own mini-processors. For example, mathematical equations, tables, and flow charts are all elements that can make up a document. Specialized document processors are available for each element.

Specialized document processing tools support many of the semantics needed to edit these particular elements. Embedding such knowledge in the programs allows manipulations that are more natural for the particular type of document element. For example, movement of a box in a flow chart could cause all connected lines to remain attached to the box. A table editor may allow for the insertion of rows of data with a simple command. (For a more through explanation of these systems, please see Section 4. 1 Types of Document Processors in Chapter 4 Form and Function of Document Processors.)

Tables represent a particularly common document element that many systems support. Table editors exist in all kinds of electronic publishing systems, ranging from the lowend word processors (such as MS Word), through page layout systems (such as PageMaker), to the higherend systems (such as Interleaf).

Another interesting and growing specialized document processing field is legal document assembly. The high-end document assembly packages, such as CAPS by CAPSoft or WorkForm by Analytic Legal Programs, allow Joe Lawyer to produce document templates that can be used by others on the staff. The templates are produced by answering a series of questions to direct the software to assemble the document by pulling in the correct text from its textual database. Accuracy is of the utmost importance in legal documents. One misplaced word can be the source of litigation or of numerous other complexities.(21)

Maintenance manuals used by the military raise another legal issue. These manuals usually contain lots of "WARNING" boxes indicating some important message. For example, "applying more that 10 fps torque will cause death and destruction." It is legally mandated that the WARNING be placed before the text of the section covering that topic.

The placement of mandatory items has some interesting ramifications for on-line reading, exemplified by hypertext browsers and the Web.(22) An on-line document browsing system must be designed to display the WARNING before allowing the display of the associated text. Random browsing through the document must factor in this requirement. The trick, of course, is to do this without interfering with the flexibility of browsing and searching, which is desirable in the on-line document viewers.

Another interesting specialized writing category are programs for children. As the cost of compute power has dropped, a market for children has developed. Microsoft has a particularly appealing program called Creative Writer. Taking a cue from KidPix by Broderbund, a kids-oriented paint program, Creative Writer uses sound for all types of feedback. When you drag an object, a scratching sound plays; when you erase with the vacuum cleaner; it sounds like one. Button and lever selections have colorful play like visual feedback as well.

n*Fiii

_’r ___

Microsoft's Creative Writer - word processing for kids.

For beginning users, a character, called McVee, pops up in a manner similar to Apple's balloon help, but more kid-oriented, and leads you through various procedures. One novel feature is in a "magic apple." After you write a story you can select the apple and many words are recognized, and little pictures are placed beside them. The look is like an old children's story book. It is very effective.

Budding movie and television authors can get script writing software. These packages format the ' document according to industry norms. They also help in the process of character and scene development.

ScriptRighter screenplay formatting software is 'a Windows based product. ' '

' . Some of the control button in ScriptRighter .

Scriptware and MovieMaster are two more word processing tools specifically set up to aid the development of scripts. ' - ; , '

MovieMaker script writing software . . . .

/ All ■ of these writing tools understand the semantics of scriptwriting. Characters, dialogue, _ action, and scenes are all meaningful document processing terms to these programs. Of course, you still have to have the imagination to write something interesting in the first place.

Finally there is a Web site called "The Writers Computer Store" at http://www. ‘

hollywoodnetwork.com/hn/shopping/kiosk/index.html. It contains good pointers for . ‘ locating these types of specialized writing software products.

[SECTION 3.7], [TABLE' OF ' CONTENTS]

. Skip to chapter[1][2][3][4][5][6][7][8][9] ' ^:

A Simon & Schuster Company Upper Saddle River, New Jersey 07458

3 . 7 On-line View

The increasing use of the Web has moved the once highly specialized field of on-line documents into ordinary computing practice. The design of a document and its delivery are becoming more intertwined with the use of the Web as a principal medium for information dissemination.

In the design of a document for print, information is often constructed and organized around reasonable chunks of information, chapters, sections, subsections, and pages. The on-line form of a document also has these types of units, but the organization must take into account factors such as the bandwidth of communications and the speed of displays. In general, one could say it is "better" to break a large document into a bunch of little ones for on-line viewing. It's very annoying to come upon 20-30 page documents that have been moved onto the Web as a single file. The browser spends costly time sucking up the whole document, and programs tend to crash even more than usual under the load.

The key difference between on-line and paper documents is the issue of interaction, the user interface. A Web browser used often provides a well understood interface. The documents, however, must also be organized to take advantage of the interactivity without distracting the user.

The Web is in some sense in the same evolutionary state that WYSIWYG document systems were 10 years ago. The Mac provided a nice user interface with lots of nice fonts; and it encouraged "by God if I've got those fonts I'm going to use 'em" attitudes. The features offered by Web browser vendors are compelling and cry to be used. But they tempt, like the Sirens, and lead to clutter, obfuscation, and poor usability.

The problem is that most people are clueless about graphic design, layout, and typography, and it shows. A document with 15 different fonts, all on one page, is cluttered, noisy, and distracting, and the surface finish overpowers the message intended by the author.

In early 1996, Netscape changed its clean home page to one which used a series of "Frames" and Java generated activity. These frames forced the user to change from the familiar "back" button to use the mouse "back frame" selection. In addition, the performance was significantly worse, partially due to Java, but mostly due to the drawing and redrawing of the content in the frames. The graphic and user interface design was very good and cleanly laid out, however, the other problems overwhelmed the benefits (in this author's opinion). It was clearly a case of feature-driven design. Netscape was going to show off the capabilities and damn the consequences. After less than a month of the new frame-based design, they reverted to the older non-frame based design as the default,

and let the user turn on the new design if selected, a much better migration path.

Netscape's frame-based home page design

Web authors are currently intoxicated with the growing list of features. Tables, frames, Java, and VRML are all useful technologies when used in moderation and used for a purpose. The use of technological features for the sake of using features has led to many ugly Web pages. Restraint is the best rule of thumb.

[SECTION 3.8] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

A Simon & Schuster Company Upper Saddle River, New Jersey 07458

Legal Statement

There is no single correct way to look at document processing issues. Each project has unique constraints and circumstances. However, it is important to appreciate that different points of view exist and are useful.

For one project, design may be paramount; for another, the logical structure may be critical. In the end, any evaluation of a publishing system depends on what you need for a particular project.

In any evaluation of a system, half the battle is to ask good questions. The various points of view discussed in this chapter provide a useful frame of reference that will help you to ask good questions.

[CHAPTER 4.0] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

A Simon & Schuster Company Upper Saddle River, New Jersey 07458

Legal Statement

Chapter 4: Form and Function of Document Processors

"Form ever follows function.” -Louis Henry Sullivan

Document processors take a number of forms and perform a variety of functions.

What does the term document processing mean? After all, a document is something to be written and then read. Talking about "processing" a document can seem out of place.

When we use a computer to write or edit, we use a publishing tool such as a word processor or a page layout system to place our thoughts onto paper or display. The concept of processing is central to this task.

As we think about document processing we can view a document in two ways. First, a document functions as data, which are entered by the author and processed by the document processor. Second, the document functions as a piece of software, which directs the function of the document processor. Each document processing system tends to lean toward one approach or the other.

We now ask, when is a document data, and when is it software?

Documents function like software when they direct a document processor to perform according to procedures that are embedded in the document. Such documents are usually

created with a simple text editor with a variety of special commands embedded in the text.

In contrast, documents that function more like data are created using a word processor or a WYSIWYG (What You See Is What You Get) editor. The user is not given the opportunity to enter data that will corrupt the document or stop the formatting abruptly. Thus, the user interface prevents most basic errors and bad data. Of course, this is an oversimplification of real life. Users can and will do all sorts of nasty things that can break documents; however, they must try hard.

Let's examine a few ways in which writing a document with a document processor is like writing software. The content (source code) of a document must be put into a form that can be processed (compiled). The actual printing (execution) or display of the document is the final step. The document processing program interprets the content and produces a printable form. All the various codes in a documentsometimes hidden, sometime visiblemust be syntactically correct, or the document will not process correctly. Rigorous syntactic checking is possible when documents are created using certain international standards such as SGML or HTML (see Chapter 5 Document Standards).

Another important similarity between document processing and writing software is in the area of debugging. Sometimes, when a document is printed, it looks crazy. It is not unusual to produce a document with very large errors, such as margins off by a couple of inches, illustrations that don't appear, and fonts of the wrong size. Often this is due to a "bug" in the document. For example, a table of contents might be generated by looking for all paragraphs tagged with the name SUBSECTION. If a particular subsection was not tagged correctly (i.e., with the tag SUBSECTION), then it would not appear in the table of contents. These types of problems can become insidious. Usually, they are not easy to spot and can be very difficult to track down. Wouldn't it be nice if document processing systems included debugging tools as part of the system?

The two major differences between document processing systems are how the user enters the information and how that information is interpreted. Therefore, writing with one system is different from writing with another system. The internal capabilities of the system will affect writing, design, and production. For example, many technicallyoriented publishing systems do not support a feature such as the automatic flowing of text around a graphic. (See Section 3. 1. 4 Page in Chapter 3 Points of View for an illustration of flowing text.) It is unwise to design a layout that needs this feature if the publishing system doesn't permit that type of text flow.

Let's turn now to a discussion of the various types of document processing systems.

4 . 1 Types of Document Processors

Build a system that even a fool can use, and only a fool will want to use it. George Bernard Shaw

There are many different types of document processors. A useful way to analyze them is to put them along a line that goes from simple text editing to WYSIWYG.

The path from text editors to WYSIWYG systems is not a path representing systems of increasing functionality. WYSIWYG systems do not have all the functionality of languageoriented systems; similarly, languageoriented systems do not have the functionality of structure editors. Specific products such as WordPerfect and MS Word blur the line between traditional word processors and WYSIWYG systems with new versions running under a Graphical User Interface (GUI), such as Windows 3.1 or the Macintosh.

4 . 1 . 1 WYSIWYG Features

As the name implies, the main characteristic of a WYSIWYG (What You See Is What You Get) system is not only that you can see the document before you print it, but also that you can edit it while you are looking at it. The display happens in real time, without any significant delay.

Most WYSIWYG systems use one of two approaches to editing specialized document types, such as tables, equations, and graphics. Some systems try to provide everything by having graphics and equation editors always available. Others automatically popup specialized editors when the document type is selected. From a user's perspective, either approach can be implemented smoothly. A smooth, consistent user interface is particularly important in a WYSIWYG system.

Another way to handle a variety of document types is to actually launch other applications that are external to the publishing system. For example, if a spreadsheet included in a document is selected for editing, the spreadsheet program might be started when the figure

is selected. (See Section 3. 4 The Engineered View in Chapter 3 Points of View for a discussion of these issues.) Web browsers use the concept of "helper" applications to accomplish the viewing of data types not explicitly understood by the Web browser.

Most WYSIWYG systems provide tools to associate visual styles to document elements. For example, a paragraph tag called CODE may be used to visually set off the computer source code, in a set of software documentation, with a particular font.

One of the more significant challenges for WYSIWYG systems is to create visual systems for global processes. The management of large numbers of files is not something you really want to do in a WYSIWYG form. When handling large volumes of documents, you certainly don't want to be forced into more input interaction than is absolutely necessary. Changing the layout of several thousand documents needs to be an automatic process.

WYSIWYG systems tend to be more closed than their languageoriented cousins. To have realtime WYSIWYG editing, publishing systems use their own proprietary formats for the sake of efficiency.

However, WYSIWYG systems are not necessarily closed. Some systems provide interfaces to programmatically get at the internal representations. More commonly, systems sometimes define and use published interchange formats (see Section 5. 1. 3 Lots 'O Formats in Chapter 5 Document Standards) , which can be used as the basis of translators.

Now that we've seen WYSIWYG, let's turn our attention to the more complex languageoriented publishing systems.

4 . 1 . 2 Language Characteristics

Languageoriented document processing systems are much better than their WYSIWYG relatives at performing global or bulk actions. Manuals with thousands of pages of reference material do not necessarily have to be seen to be formatted. In fact, "What WYSIWYG advocates forget is: sometimes you don't want to see it at all."(1)

Some documents span many thousands of pages. Typical among these are technical documents such as maintenance manuals and software documentation. Two major classes of tools can handle these documents. One is the older (some would say more mature) markup, commanddriven, languageoriented document processors typified by programs such as troff and TeX. The other class are WYSIWYG publishing packages, typically running on workstations, exemplified by Arbortext's The Publisher, FrameMaker, and Interleaf.

Perhaps the best argument in favor of languageoriented document processing systems, according to Brian Kernighan, is that, "Once a task is well understood, it should be relegated to batch processing." (2) Nothing is quite so frustrating as being forced into repeated cumbersome interactions with a system to accomplish routine tasks. The ability to automate your routine tasks, which may, in fact, not be routine to anyone else, is critical. The specification of these tasks may take some time and be difficult. However, once they are specified, they are easily used again and again.

The features provided by a publishing system vary according to the scale of documents it was designed to handle. A system intended for simple reports will not be able to manage multiauthor documents with thousands of pages. Largescale systems must support more rigorous forms of change control and the ability to make global changes across entire sets of documents.

4 . 1 . 3 Specialized Languages

Various document types such as tables, equations, and graphs have their own special properties. Several specialized languages describe and create these document elements. In the Web world, a good example is the extensive set of markup tags to define tables that have already been developed and implemented.

Specialized document processing systems are a microcosm of the general case of document processing. For example, there are WYSIWYG flow charting systems and languageoriented flow charting systems. The same is true for equation processors and tables. (See Section 3. 6 Specialized Views in Chapter 3 Points of View for a discussion and illustrations of these specialized processors.) These specialized languages, sometimes called "little languages,"(3) are used to perform very specific functions.(4)

Graphs, an important part of business and scientific publishing, deserve their own document processing language. The little language, grap, serves this need. Grap is a troff preprocessor language for the specification of graphs.

"graping around" - A sample grap specification and the resulting graph.

For the sake of completing the major UNIX troff preprocessor languages, eqn (to specify equations) and tbl (to specify tables) must be mentioned. Eqn and tbl are the "elder statesmen" of troff preprocessors and have been used for almost 20 years. They are robust, industrialstrength languages and are part of the standard UNIX document processing tools.

Bibliographies represent another document type for which specialized languages ease the processing. The management of bibliographies is an area where languageoriented systems are far superior to the WYSIWYG folks. TeX users have a set of macros called BibTeX, and troff users can use refer to manage references and bibliographies. The appearance of a reference in the text can be modified by the author. These packages can also work with relational databases or other filing mechanisms to store bibliographies used by an entire organization. WYSIWYG publishing systems are still playing catchup in this domain.

Many languageoriented document processing systems can be used with previewing software. This software lets you look at the document on the screen before printing, saving some trees in the process. Previewers represent a midway point on the line between languageoriented and WYSIWYG document processing systems, discussed earlier in this section. Previewing systems allow you to look at the printed form of the document on the screen, just as you do in a WYSIWYG system; however, you cannot modify the image, and the display may take a while. Ghostscript, the GNU project's free version of PostScript, will let you preview PostScript files and is available on many computer platforms including PCs.

Typically, you use a previewing system by observing approximate views of the page, making some edits, observing again, making more edits, observing the page, and so on, finally printing the page. Switching back and forth between the previewing mode and the

editing mode can be timeconsuming. However, it is faster and more flexible than printing each page. TeX and troff previewers are widely available. Previewers are great aids, especially if the speed and availability of printers in an organization are poor. More importantly, previewers help to balance the often rigid complexities of languageoriented document processing systems with the adhoc nature of WYSIWYG systems. A relatively new system, Adobe's Acrobat (see Section 7 . 4. 2 Applying Standards) can also be used this way.

4 . 1 . 4 WYSIWYG versus Languages

Each class of programs has its advantages and disadvantages. The features you need depend on what you are trying to do.

The WYSIWYG class of document processors is more appropriate for seat-of-the-pants design and for rapid changes with less rigidity. However, the languageoriented document processors have a significant advantage when it comes to global changes. This is a desirable characteristic if you are concerned with uniformity over many thousands of pages.

Languageoriented document processors are awkward to use and are anything but intuitive. The publishing department's staff will need a significant amount of time and experience to become familiar with the idiosyncrasies of the software. Writing a document with such a package is remarkably similar to programming and requires as much attention to detail. However, the payoff on the investment in time and training is significant. In particular, if your organization must repeatedly accomplish the same complicated tasks, you may be able to automate much of the process using languageoriented document processors. You can create style sheets with specialized commands for your own needs. These commands will not only be relatively simple to use but will also ensure conformance to your organizations requirements. The open nature of the languageoriented document processors is also extremely important for those exceptions to the rulewhich seem to happen every other day.

Ill iLanriMlf IdUJ Ufa fa ■

m 'i»r ht.

U 'HHI ULil^

V hHWHJVHFtfl P-H

■r ■- p- ii-¹., iHPfKiPrreftiMti

pr-H hK'i ^IIF¹

: "-t ¹ r ■■ p "Hwr 'Hi^-1

Numerous WYSIWYG software packages address the needs of the technical documentation ("techdoc") market.(5) Interleaf, FrameMaker, and Arbortext's The Publisher are three of the more significant ones, each with interesting characteristics.

Interleaf is oriented toward the dedicated publishing department. It has robust facilities for

sharing documents and handling publication series with large collections of documents. If you use Interleaf on any number of computing platforms, the user interface will be virtually the same and familiar to the user.(6)

FrameMaker takes the good-neighbor approach to user interfacing. It is integrated with the normal working environment and window system's look-and-feel for any particular host machine. For example, if you use FrameMaker on a DECstation under a Motif look-and-feel window system, then FrameMaker behaves as a normal Motif application. On the Macintosh, FrameMaker looks and feels like a normal Macintosh application.

Arbortext's ADEPT*Publisher is a good example of a system that balances the closed nature of turnkey systems with the complexity of open systems. It provides a number of specialized WYSIWYG editors such as table and equations editors. Using the SGML version of the product, the entire document can be output as a validated SGML document. Validation is done in an interactive manner, not as an afterthefact process. In addition, ArborText offers a feature called the ADEPT Command Language (ACL) for languageoriented capabilities, such as the automated editing of large documents.

All these systems walk the tightrope between WYSIWYG and languages. Too much emphasis on the language side, and the user interface suffers; too little, and global automation is difficult.

Let's turn our attention now to a comparison of the functionality that WYSIWYG and languageoriented systems offer.

4 . 1 . 5 Comparative Functionality

How do document processing systems compare with respect to functionality? In particular, how does the WYSIWYG versus language orientation affect the functionality? As in Chapter 3: Points of View, it is useful to examine functionality in terms of the various points of view, such as design, communications, engineered, and specialized.

Most often, the primary function of any document processor is to format a document. The inherently visual nature of a WYSIWYG system allows for a more interactive, adhoc design of a document. More often than not, it also gets in the way of routine repetitive tasks. Languageoriented systems provide more opportunities for automation.

DESIGN

The fonts used by a system can profoundly affect the look and portability of the document. WYSIWYG systems have an inherently more difficult time dealing with fonts because they must have versions that can be displayed on the screen as well as printed.

Adobe Type Manager solves this problem for the Adobe PostScript fonts, and TrueType from Microsoft and Apple uses the same font description both for printing and display.

The selection and adjustments of fonts and the overall typography of a document are easier with WYSIWYG systems than with languageoriented systems. The tedious trial and error of document editing, printing, and document editing, again and again and are very time consuming.

Many implementations of both major types of document processing systemslanguageoriented and WYSIWYGallow the creation and use of document templates. The terminology for templates varies. Sometimes they are called styles; other times, macros. However, they usually have the same function. A template is a particular document format that can be used over and over again. Some systems refine the concept of templates, categorizing them by types. Fonts, paragraphs, and page layouts can be categorized with individually named styles for particular use. These named styles can be used for many documents.

COMMUNICATIONS

When considering the communications point of view, WYSIWYG systems provide most of the tools. Spell checkers and a thesaurus usually require interaction. Languageoriented grammar checkers can also provide useful reports, which you can use later to help edit the document.

The ability to search the text and replace it with another piece of text is one of the more basic of all functions that an editing system should provide. However, search and replace functionality can become much more sophisticated once we remove the limitation of textonly search and replace. Some systems allow the searching of tags and styles. The searching itself can use flexible "wild cards" or regular expressions to match patterns of text. (See Section 3. 1. 7 Enterprise in Chapter 3 Points of View for an explanation of regular expressions.) The ability to search for an item, such as a cross reference, is also useful.

ENGINEERED

From an engineering functionality point of view, WYSIWYG and languageoriented systems provide similar capabilities. We are all familiar with the ability of even simple document processing systems to generate page numbers. Tools for large documents are often able to generate much more, including tables of contents, lists of figures, and lists of tables. The careful, consistent use of tags or styles enables these systems to determine the textual items to include in the various lists.

The ability to create running headers and footers, such as those used in this book, is a valuable feature for larger documents. They give the reader an instant context for the page being read, as well as an overall professional appearance. Document processing systems can automate this entire task by using particular tags or styles to identify the header and footer text. Languageoriented systems are much easier to automate.

A flexible autonumbering mechanism is a must for technical documentation. (How else could we reliably refer to Section 1.3A.4.2.9a?) A range of choices for numbering is important. For example, you may want some sections to be numbered in roman numerals (I,II,XI,VI,L), other with letters (a, b, c, aa, ab, ac), and still other with digits (1, 2, 2.1, 2.2, 3).

Another extremely important feature is index generation. The system should support several different indexing schemes. Critical among these is the ability to sort the index on a variety of criteria. In more extensive indexes, the ability to highlight the primary entry makes the index even more useful.

For example, the following index entries show how terms may appear under several headings.

In addition, one particular entry may be the major one. If so, it is often highlighted.

The ability to put cross references in the index is also useful. A good index is vital to any large reference document. The more flexible the document processing system is, the better.

From a functional point of view, index generation is virtually identical for WYSIWYG and languageoriented systems; you must always mark the item to appear in the index by hand. If, however, you do want to develop a semiautomated scheme, it would be easier with languageoriented systems because of their inherently more open nature.

Cross referencing is another area where larger document processing systems excel. Traditionally, cross references have been a prime source of errors. This problem seems quite natural; the section you referred to in one part of the document moves or is even eliminated over the course of editing. To ensure accurate cross references, document processing systems are almost essential. (Of course, nothing can replace a good copy editor.) WYSIWYG and the languageoriented systems are virtually the same here, although selecting a target reference is somewhat easier with the WYSIWYG system.

One final aspect of the engineering point of view is structural validation. Structure editors, which use standardsbased markup, can let you know if the document is structurally valid. They can tell you if a document has all the right pieces and if they are in the right order. In the case of Web documents that use HTML, there are HTML validation programs (and services) that can tell you if the structure of your document is correct.(7)

SPECIALIZED

Documents are rarely composed of text only. A good document processing system must be able to handle images, graphics, tables, and equations. The degree to which a system allows manipulation of these specialized items may prove significant for your particular application.

Most systems allow the exact position of the graphic or table to float. This means that the system will move the position of the itemsometimes to the next pageto avoid large areas of white space.

Some WYSIWYG publishing systems provide the usual sort of drawing manipulation tools. These include cutting and pasting, along with rotating, stretching, and scaling. The system may also support positioning functions such as alignment and distribution of many items. You should also look for the capability to manipulate images with respect to brightness, contrast, and other factors. These capabilities, however, are not easily translated to the languageoriented systems, so a strict comparison is not appropriate. If you are using a languageoriented system and need these functions, the best solution is to process the items in an auxiliary system and import them into the publishing system.

[SECTION 4.2] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

A Simon & Schuster Company Upper Saddle River, New Jersey 07458

Legal Statement

4 . 2 Stages of Document Processing

What are the stages a document passes through as it moves toward completion? What happens in each stage and what role does a document processing system play? We will now examine these questions, as well as some other useful practices.

4 . 2 . 1 The Phases of the Process

Let's examine the six phases in the document creation process. These phases are design, writing, illustration, editing, production, and distribution.

DESIGN

Using a document processing system to design a publication invites many possibilities. The document has both a visual appearance and a logical design. The order of the items, such as the cover page, the table of contents, the chapters, the appendixes, and the index, makes up the logical structure of a document.

Style sheets and project-wide templates define the document's visual appearance. They must work within the framework of a document's logical structure. Properly used styles can help make a document conform to a specified document structure. This structure may be mandated by corporate standards or other factors.

WRITING

Writing with a document processing system is different from writing without one. The supplementary tools such as grammar and spelling checkers, thesaurus, and reference guides aid the process of writing.

Sometimes these tools are part of the system. At other times, they are utilities that can be used with many word processors that are not directly tied to a particular system. However, you can invent systemspecific personal tricks to take advantage of system capabilities. For example, some systems allow text to be hidden based on some condition, such as a comment or other userdefined property.

To take another example, while writing this book, I created a paragraph tag called editorial, which I used to keep temporary comments to myself. Sometimes I would print out all paragraphs with the tag editorial, effectively producing a "to do" list of tasks left on the book.

You can use this trick, in one form or another, on many systems. This and other capabilities were created for other purposes, but as you become more experienced with a particular system, you learn tricks and use them as you write.

ILLUSTRATION

Integrating graphic illustrations or photographic images with the text of a document is one of the more troublesome and complex areas of electronic publishing. The publishing system must be able to include graphics, but it must not necessarily be able to display the graphics on the screen.

As for clip art, you can't simply buy a collection blindly. You must know whether it will work with your publishing system and what kinds of manipulations you will be able to accomplish. (See the Clip Art section in the Resources appendix for more information.)

EDITING

Electronic publishing systems don't really provide much help in the editing phase. Instead, someone must review the text and check the content. However, on some systems, you can mark up the text electronically using underlines, strikeouts, and color.

For electronic markup to work, everyone on the project must agree on its meaning. When several people are involved with the same document, one important consideration is access. Permissions for access to the files must be properly set up. It's also important to have some sort of versioning or lockout system so that people don't accidentally write on each other's files.

Another problem that occurs when many people work on a document stems from font usage. The WYSIWYG systems can display only the fonts that are available on the computer. Everyone on the project must have the same set of fonts so that the document will print and appear correctly on the screen.

PRODUCTION

The usual way of preparing a document for printing is to create a series of PostScript files. (Yes, it's true that not everyone must create PostScript, but it's as close to a universal standard as the world has.) If your publication will be printed at a service bureau you must be sure that the bureau has all the fonts you need. The high quality printers of 1200 dpi (dots per inch) or more will also print patterned areas very differently that standard laser printers do. A 50% gray pattern will appear much darker on a standard 300dpi laser printer than on a 1200dpi printer. Halftone images will also have a very different overall lightness. Color printing is a totally specialized art; if you're using color printing, don't try

it yourself, get a trained professional.(8)

DISTRIBUTION

You can distribute electronically produced documents in two ways: through traditional paper distribution channels and through electronic distribution. The Web and Internet have become the medium of choice for the electronic distribution of documents. CD-ROMs still provide a good mechanism for mass distribution without the network hassle. (9) If you are writing a document that will be electronically browsed, you will probably want to arrange the visual appearance appropriately.

Electronic distribution also brings up the problem of run-time software. To broaden your potential market, the electronic document you want to distribute must run on as many systems as possible. Web documents that take advantage of specific vendor "enhancements" will not be viewable with all browsers. (See Section 9. 11 The Internal Revenue Service (IRS) in Chapter 9 Case Studies for an example of a Web site which take account of different browser capabilities.) You may want to explore the possibility of converting the document to several formats.

4 . 2 . 2 Recommended Practices

Just as software engineering practices provide a method for controlling and managing the software creation process, document engineering provides a method for controlling and managing the document creation process. Document engineering is not a genuine field of study...yet.(10) But let's discuss what may be the key elements of this new field.

Good conventions for naming the document elements, such as paragraph tags and styles, are as important as good naming conventions in software development. Although a strict comparison to software engineering quickly falls apart, keep in mind that the document you are creating must be processed before it can be printed or displayed.

Concurrent engineering (CE)(11) is another field of study from which you can draw a number of parallels to electronic publishing. Design for manufacturing, one aspect of CE, is an approach in which a designer of an electronic circuit board, for example, selects component parts, based not only on functionality but also on availability. An amusing story from the book A Whack on the Side of the Head illustrates this point:

One of my manufacturing clients has a "single-sourced" capacitor designed into a circuit-board his company was producing. Manufacturing people typically go out of their way to avoid single-sourced parts, i.e., those produced by only one outside vendor. They reason that if only one vendor is producing a particular sub-component, then an entire

manufacturing group can be idled if anything happens to the vendor's capability to produce.

Things were fine until the vendor had production problems and could no longer meet demand. My client spent a lot of time attempting to track down more capacitors, but was unsuccessful. Finally, he went back through five layers of management to the design department to see how critical this capacitor was, and if it would be possible to use a replacement. When the design engineer was asked why this particular capacitor had been chosen, he replied, "I chose it because it's blue, and it looks good on the circuit board."

The designer had never bothered to consider what impact such a choice would actually have on getting the product out the door. His tunnel vision had prevented him from even looking for such a problem.(12)

Similar problems occur in electronic publishing. A graphic arts department may design a page layout without any consideration of the fonts available for printing. A complicated multicolumn layout may be virtually impossible for the system used by other staff, but simple with the system used by the designer. Similarly, design for a home repair manual may be fine for paper printing, but on-line viewing may require a larger screen and different layout.

Concurrent document engineering makes a great deal of sense. In practical terms, this means that you should find out if the people or service bureaus that will be involved with the document have all the necessary resources, such as fonts and software. If the document is intended for electronic distribution, you should check things like run-time software and platform portability. Early on, bring in the people involved in the later stages of the process. Printers may have advice on color separations, and this advice may affect the way you input images into the document. Internet service and Web site providers may have recommended Web browsers and display conventions.

Good document management is another practice you should follow. (See Chapter 8 Document Management for a more through discussion of these issues.) Simply put, the most important aspect of document management is to have a clear understanding of exactly what you need to manage. Fonts, collections of styles, template documents, and so on, must be clearly identified and should be placed under a central configuration control system.

Now that we've discussed the various types of document processing systems and some of their functions, let's turn to the issue of markup. Markup is the basis for the major document standards and is a fundamental concept used in virtually all document processing.

[SECTION 4.3] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

A Simon & Schuster Company Upper Saddle River, New Jersey 07458

Legal Statement

Markup is information that is embedded in the text of a document that is not intended for printing or display. It may consist of instructions to a printing device, commands for a word processor, or even comments to a coauthor. All languageoriented document processing systems require some sort of markup. WYSIWYG systems often have markup that is hidden from the user. Otherwise, all you have is text with no information for the document processor.

4 . 3 . 1 Types of Markup

The three main classifications of markup discussed in the following sections has inspired the creation of a number of standards. It is also possible to create the markup itself in a number of ways, which are discussed at the end of this section.

SPECIFIC MARKUP

Specific markup, sometimes called procedural markup, is often found in word processors and older (yet still used) typesetting systems. The function of specific markup is to tell the system how the text should look when printed. Typically, these are instructions to format a section of text bold or centered and of a particular size.

Specific markup can also be used to tell the system to perform some processing function on the text or on other items (for example, to count the number of figures). Sometimes the markup is hidden from the user; this is the case in a WYSIWYG system. TeX and troff commands embedded in a document are a form of specific markup. In effect, the markup consists of procedural commands that direct the document processing system to perform certain functions.