7 . 1 . 3 CALS and Electronic Publishing
CALS, the Computer-aided Acquisition and Logistic Support project within the Office of the Secretary of Defense (OSD) of the U.S. Government, has had a ■ significant impact ■ on
the electronic ' publishing industry.- One key goal of the CALS project is to reduce the use of paper, and this goal ■ has ■ been ■ embraced by virtually ■ all participants.
■p ■. ■. s . ■. s ■. ■. s ■.
We're not talking about eliminating a few notebooks, either. Any significant project run by the Department of Defense (DoD) requires warehouses of documents. Extensive .. . documentation is not a fabrication of DoD; instead, it results from the extremely large, complex, interrelated contracts with literally thousands of contractors and subcontractors involved in a single project. Managing such complexity requires careful attention to standards. The careful use of standards is a key element in the quest to reduce the cost of these projects. .. .‘ . ‘ ‘ .. 1
SGML is one of the CALSselected standards for document processing. Likewise, the ' ■ rapid adoption of SGML by document processing vendors was significantly influenced by the notion - of potential CALS customers. There - exist several specific CALS document type definitions (DTDs) that define the structure of a CALScompliant document. Once - a . body- of text has been properly tagged, it can be used in a number of ways. (For a more - ‘
. through discussion on the variety of uses, see Section 7. 3 Multiple Use in this chapter.)
1 In addition to the work on - document processing for printed documents, the CALS(7) arena is experiencing a great deal of interest in ■ creating on-line, interactive documents.
' : This effort is called the Interactive Electronic Technical Manual (IETM). Its ■ concept is to
‘ allow engineers with portable computers equipped with CD-ROM drives to interactively browse through maintenance manuals, at the site of a repair (for example, in an aircraft . hanger) or in the field. '
Command Carderock Division (NSWCCD). The classes are "fairly broad," and "The class definitions, however are - insufficient for use on contracts." However, they are illustrative of - a coherent approach to the - creation of complex document requirements.
Following are the IETM class definitions:
Class 1
Electronically Indexed Pages
Display
. Full page viewing . Page-turner/Next function . Intelligent index for user access to page images . Page integrity preserved Data Format
. BitMap (raster)
. Indexing and header files (Navy Mil 29532)
. MIL-R-28001 or Postscript pages . Generic COTS imaging system formats Funtionality
. Access pages by intelligent index/header info . View page with pan, zoom, etc., tools . Limited use of hot-spots . Useful for library or reference use
Class 2
Electronically Scrolling Documents
Display
. Primary view is scrolling text window . Hot-spot access (Hyper-links) to other text or graphics . User selection and navigation aids (key-word search, on-line indices
. Minimal text-formatting for display . User selectable call to (launch) another process Data Format
. Text - ASCII
. Graphics -whatever viewer support (e.g., BMP or CALS)
. Can be SGML tagged - no page breaks (browser)
. Access/index often COTS dependent with Hypertext browser . Generic: COTS with Hypertext browser Funtionality
. Browse through scrolling info
. User selection of graphics or hot-spot reference to more text . Hot-spot and cross-reference usually added after original authoring
Class 3
Linearly Structured IETMS
Display
. View smaller logical block of textless use of scrolling
. Interaction through dialog boxes . Interaction per MIL-M-87268 to extent possible . Text and graphic simultaneously displayed in separate window when keyed together EXAMPLE Data Format
. Linear ASCII with SGML tags . SGML with content vice format tags . Maximum use of MIL-D-87269 . Generic: SGML tags equivalent to MIL-D-87269 Funtionality
. Dialog-driven interaction
. Logical display of data in accordance with content . Logical NEXT and BACK functions . User-selectable cross-refs and indices . Content-specific help available
Class 4
Hierarchically Structured IETMs
Display
. View smaller logical block of textvery limited use of scrolling . Interaction through dialog boxes with user prompts . Interaction per MIL-M-87268
. Text and graphic simultaneously displayed in separate window whenkeyed together EXAMPLE Data Format
. Fully attributed DB elements (MIL-D-87269)
. MIL-D-87269 content tags with full conformance with Generic Level
. Object Outlines (architectural forms)
. Authored directly to database for interactive electronic output . Data managed by a DBMS . Interactive features "authored in" voice added-on . Generic: COTS equal to MIL-D-87269 data definition and tags Funtionality
. Dialog-driven interaction
. Logical display of data in accordance with content . Logical NEXT and BACK functions . Useful as interactive maintenance aid . User-selectable cross-refs and indices . Content specific help available
Class 5
Integrated Data Base (IETIS)
Display
. Same as Class 4 for IETM function . Interactive electronic display per MIL-M-87268 . Expert system allows same display session and view system to providesimultaneous access to many differing functions (e.g., supply, training,troubleshooting)
Data Format
. IETM info integrated at the datalevel with other application info . Does not use separate databases for other application data.
. Identical to Class 4 standards for IETM applications data per . MIL-D-87269
. Coding for Expert Systems and AI modules when used . Generic: COTS equal to MIL-D-87269 data definition and tags Funtionality
. Single viewing system for simultaneous access to multiple info sources
. Same as Class 4 for IETM functions
. Expert system to assist in NEXT functions, based on info gathered in session
One prime motivation behind the selection and development of a series of standards is the desire for meaningful exchange of documents. Contractors and subcontractors would certainly work together more efficiently if they could exchange documents electronically. Document exchange is a deceptively complex problem, as we will show in the next section.
[SECTION 7.2] [TABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall," Inc.
A Simon & Schuster Company Upper Saddle River, New Jersey 07458
_ L Legal Statement _ t
Conceptually, document exchange is very simple. Two people who need to collaborate on a document must be able to read, write, and edit the document. The system that each individual uses must be able to manipulate the electronic form of the document. The textual information is usually not a problem, but the document's formatting and structural elements are another matter. If the two parties really need to work with a visually identical document, they must use the same collection of publishing applications, operating system, and operating system resources. Furthermore, they must use these resources in the same way.
The fundamental difficulty is that text, graphics, and images must all be used and understood by different systems in the same way. Words, paragraphs, pages, graphics, images, and so on are objects with which we associate meaningsemantics. Words, for example, may be hyphenated in one system and not in another. Graphics may be editable or not. Paragraphs may be automatically numbered. All these semantically meaningful operations are difficult to transfer from one system to another. In reality, no one has been able to figure out a practical way to accomplish transfers across multiple applications and multiple platforms. Standards ameliorate the problems, but don't eliminate them.
Compound documents containing information from several applications make the exchange problem more difficult. The more dynamic aspects of such technologies as "live" links, "active" documents, object linking and embedding (OLE), Java, Visual Basic scripts, and so on exacerbate the exchange problem. The exchange of these new types of documents is an unsolved problem, and the electronic publishing world must solve it.
7 . 2 . 1 Types of Document Exchange
Because document exchange is such a difficult problem, it is instructive to break the problem up into different types of exchange. One useful classification is the exchange of a document from a purely visual point of view. Another view is the exchange of logical or structural information. This classification is similar to the two types of information dealt with in the ODA standard: layout (visual) information and logical (structural) information.
Document exchange focusing on the visual elements of a document typically concentrates on font problems, at least for starters. The system used by one person may have a different set of fonts installed than for the second person. Usually, the system will substitute one font for another, and that may be just fine. However, the document will probably change in subtle ways. The page count and line breaks will probably be different because of the
change in fonts. Even in the simplest case of document exchange, in which both parties use the same system (for example, WordPerfect), documents will not necessarily exchange identically if the parties did not agree on font usage. Document fidelity, the way a document looks, may or may not be important when two individuals collaborate; either way, it is useful to be aware of these differences.
The following story illustrates an exchange problem that was due to a lack of semantics:
I needed to simply take a small portion of one document (we'll call it the old_ document) and place it into another document (we'll call it the new document). A simple cut and paste job. Unfortunately, the old and new documents were written using different word processing systems. Well, no problem, in theory. I normally use a workstation with a robust window system and select and cut the portion of text I need from the old document and save it as a file. Then I go into the new document by using the second word processor and import the small file. I was NOT expecting to maintain any formatting information whatsoever; all I wanted was the text. Funny thing though: in the new document, I wound up with a bunch of words with hyphens in strange places. You see, the document processor used for the old document "understood" hyphenation but the window system didn't. When I selected the text using the mouse and window system, it simply selected the words and treated the hyphens as just more text rather than as a meaningful break in words.(9)
The second major type of document exchange is the exchange of logical or structural information. The logical or structural information in a document is rarely exchanged well. Conversion programs manage to keep paragraphs separate, and maybe even some of the paragraph numbering, but usually little else. Page layouts and such items as master pages and styles sheets are often lost. Carefully chosen tag and style names are usually converted (if at all) into numerical sequences such as para0, para1, para2; these names are not all that meaningful. When ^ many of these items are actually converted, the results _ usually need so much editing that the conversion process becomes extremely painful.
When exchanging structural information, SGML seems to. be a natural solution. After all, SGML captures and codifies the structure of a document extremely well. Unfortunately, most documents are not SGML documents, and converting documents into SGML may or may not be simple. If the documents are not consistently structured, conversion to SGML
will be as problematic as conversion to any other format. If by chance you do have an SGML document or can easily convert to it, the logical and structural information should convert easily into a publishing system; however, the visual information will now be the problem. The association of visual elements, such as font families and sizes, master pages, and document layout, depends on a particular interpretation of the SGML elements. That interpretation must also be converted if a complete visual and logical document exchange is to be successful. We can see similar problems as everyone trys to figure out how to convert their favorite format to HTML for the Web. This is a difficult problem!
Now let's look at the various pieces of a document that must go through the exchange process.
A document usually consists of several types of information. Again, it is useful to think of the two categories, visual and logical. You can also think of this as form and content.
A document looks a certain way because of the way it was composed, designed, and laid out. The various pieces of a document that look a particular way also fit into the structure of the document. A document means something because of its content.
As discussed in Chapter 5 Document Standards, SGML and ODA take two approaches on how the visual and logical information should be related to each other. SGML mostly ignores the issue, but allows you to define associations that can be interpreted by an application. DSSSL and the continuing work on DSSSL for the Web (called DSSSL Online) is attempting to address the issues of a document's visual appearance. ODA provides an architectural framework that integrates both the visual and logical information. However, ODA has a limited set of visual elements that have been defined for use with its architecture.
Font usage is one of the most important aspects of document exchange. If you expect a visually identical document to come out the end of an exchange process, you must pay attention to fonts. PostScript solves many of the practical problems of page layout. A page represented with PostScript is portable to any PostScript device. This is because PostScript is proprietary. Only one company, Adobe, licenses the technology. PostScript clones have now appeared and generally work quite well. One area of difficulty is that of font usage. When a PostScript document is moved from one machine or printer to another, it is important to know what fonts are available in the printer or system. If a font used in the original document is not available on the destination system, either you're out of luck or the font must be included with the PostScript file, which will increase the file size tremendously.
In the simplest case, a document composed with more than the four original Apple LaserWriter fonts cannot be printed on a LaserWriter unless the host machine also has that outline font available for downloading to the printer. As more and more fonts are used, the destination equipment, computer host, and printer must be contacted about font availability. Fortunately, once a particular font is identified as available, it should print just fine.
When you want to take a document to a service bureau for highquality printing, your choice of fonts better match what the service bureau can print. You can usually embed the fonts in the document, but you have to be conscious of these issues. The problem is worse now since the introduction of new font technologies, such as TrueType and Master Type. The service bureau must also have a collection of TrueType and Master Type fonts.
These problems are exacerbated when dealing with a WYSIWYG system. The fonts displayed on the screen are often special variations of the fonts used in the printer, and there are several flavors of screen fonts. Adobe solves the problem with Adobe Type Manager (ATM) available for both the Mac and PC platforms. ATM creates screen fonts from the same Type 1 format used for printing. TrueType, spearheaded by Apple and Microsoft, uses the same font description both for printing and display.
In many systems, the paragraph is treated as a distinct entity. It is a convenient portion of a document in which to associate visual with structural information. Some systems call the format of a paragraph a style; others, a tag. Either way, they are named entities that can be used as a mechanism for style or tag association. For example, all paragraphs tagged with the name, SectionHeading, can be given the same font, size, and positioning.
The transfer of a document from one system that uses styles and tags, to another usually results in a visually reasonable document on the receiving system. Unfortunately, it also usually loses the tag and style names, because they were all renamed and converted in a simplistic way. This level of document exchange is incomplete and is the result of incompatible document exchange, which converts the visual, but not logical, information. The user will probably need to rename and identify all the tags or styles by hand.
Although this is a much better starting place then no conversion at all, it is a problem worth noting. These incomplete conversions take a significant amount of labor to correct, and discourage the overall use of conversion systems. More importantly, they prevent automatic conversion.
Documents that contain graphics introduce another level of complexity in the document exchange process. Exchanging the graphics embedded inside a document is difficult. (See Section 6. 4 Standards and Formats in Chapter 6 Media and Document Integration for more information on graphic standards and the integration of various media types with text.)
The new forms of electronic publishing, which include sound annotations and video clips, raise the level of complexity for document exchange even higher. Issues of data formats must be addressed. The role of operating system and hardware resources takes on greater importance. To take a simple example, Apple now provides the ability to play little movies inside documents with their QuickTime operating system facility on the Macintosh. If I write a document heavily dependent on QuickTime, it won't be usable on another platform. However, if vendors create QuickTime viewers for other hardware platforms, and they have, the documents may be usable on other platforms. The same is true for sound annotations: the document must be viewed online and on a system that can handle the particular sound format.
All these new data types increase the functionality of the document, increase the complexity of document exchange, and decrease the portability of the document. Work in progress will standardize many of these elements. (See Section 6. 4 Standards and Formats in Chapter 6 Media and Document Integration for information on these standardization efforts.)
7 . 2 . 3 Direct versus Standardized Interchange
We've examined what; now let's examine how information is exchanged to achieve complete document interchange. One tried and true method to accomplish reliable interchange is to write direct translators between two systems. This method greatly increases the probability of an accurate translation since every type of information in one system is handtailored to a corresponding piece of information on the other. People who write translators interpret the semantics of items in one system into items in another with semantics that match completely or in part. The two major drawbacks of this method are that (1) you are locked into those two specific systems and (2) each time a new system is introduced, several translators must be written. The number of translators goes up geometrically (actually n2-n) as the number of systems increases.
Using a single common interchange format, new systems may be added, and only the input and output translators (pre and postprocessors) must be added.
[SECTION 7.3] [TABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall, Inc.
A Simon & Schuster Company Upper Saddle River, New Jersey 07458
Facilitating document interchange is only one reason to use standards. Another compelling reason to use a standard document representation is that it allows you to use document content in more than one way.
The proper use of standards enables the creation of documents that may be used in several ways: for printing, presentations, and on-line viewing. Several products can be based on a single repository of content. It is possible, indeed desirable, to automate the usually manual cutandpaste process. A CD-ROM electronic encyclopedia and the printed version can be created from the one collection of information.(10) Document components can be extracted for the purpose of generating automatic summaries. Indexes and automatic cross references can be created with the consistent use of tagging schemes.
In the CALS domain, a significant project that is addressing the issue of multiple use is the Interactive Electronic Technical Manual (IETM) project. The IETM project is concerned mainly with the delivery and use of on-line hypertext/hypermedia documents. However, it will be possible to generate paper copies of the document from the same electronic information.
The process of preparing information for distribution in a number of electronic forms is called data preparation.
The creation of electronic documents invariably requires a clear, organized structure for the data. The tagging and markup mechanisms used by SGML and procedural markup systems greatly aid the data preparation process. Usually, the data preparation involves identifying certain structural elements of the document. The elements chosen can become a form of outline that can be used as a user interface mechanism for on-line browsers. The Table-of-Contents used as a front end for a document is common and useful.
Key tagged elements can also be used to create automatic cross-references. Creating semantically meaningful links across a large collection of text is a significant authoring task, which should not be underestimated. Tags and markup can be used to jumpstart the process by automating some of the link creation.
7 . 3 . 2 TeX's Weave and GNU Emacs' Texinfo
Two interesting and practical examples of multiple use are TeX and GNU Emacs. TeX is one of the most widespread batchoriented typesetting languages. (See Section 4. 1 Types of Document Processors in Chapter 4 Form and Function of Document Processors.) GNU (Gnu's Not Unix) Emacs is one of the popular text editing environments for UNIX and is available for free from the Free Software Foundation (FSF). Both systems use information for two different purposes.
Another format with multiple uses is the texinfo format used in GNU's Emacs. GNU Emacs is as much a user's environment as it is a text editor. Just name a bizarre, complicated, baroque function you wish to accomplish in a few keystrokes, and Emacs probably has a builtin function for just such a purpose.
One of the more interesting features of Emacs is the info system. It provides Emacs users with a robust hypertext help system. It is an Emacs mode, which allows the interactive browsing of a treestructured collection of documentation. The info system is used as one of the principal means of documenting many of the internal Emacs modes, as well as just about anything else you like. Paper documentation is also quite nice: so, in order to kill two birds with one stone, the GNU folks have created the texinfo format. Texinfo is a collection of specific TeX macros. The texinfo document can be run through TeX to create a typeset document for printing. In addition, the texinfo document can be processed through a texinfo program, an Emacs function, to create an info system online document. The bottom line is that an author can create a single document that can be printed with highquality typesetting and browsed online. Clever folks!
[SECTION 7.4] [TABLE OF CONTENTS]
Skip to chapter[1][2][3][4][5][6][7][8][9]
© Prentice-Hall, Inc.
A Simon & Schuster Company Upper Saddle River, New Jersey 07458
Electronic distribution depends on the proper use of standards. Whether electronic or paper, documents must be distributed to people to achieve their main purpose, communication. Distribution mechanisms for electronic documents are vital to the evolution of publishing.
Electronic documents can be distributed in many more ways than can their paper counterparts. Although much of this section is applicable to software and data in general, the techniques discussed are useful distribution mechanisms for electronic documents.
Resources such as the mail response programs and other network services discussed later in this section are possible only because of network standards. In fact, the entire domain of electronic document distribution is a good example of the proper use of standards for a particular function. The underlying standards that make these services possible are the technological glue needed by people trying to communicate. Before we examine various sorts of on-line communications, let's take a look at one significant relatively new storage mechanism, CD-ROMs.
Compact Disc Read Only Memory or CD-ROM is the fortunate spin-off of the audio CD used at home to listen to highquality music. Music on CDs is recorded digitally. Back in the mid 1980s, people realized that the data used to represent music could represent anything else. Thus, the CD-ROM and standards for representing files were born. The international standard ISO 9660 for volume and file structure has enabled the CD-ROM to become a widely used data distribution mechanism. Approximately 660Mb of data can fit on one CD-ROM.
The cost of CD-ROM replication has dropped to under $1.50 per CD-ROM because of the high volume of audio CDs. Of course, the true costs in mastering a CD-ROM are hidden in the data preparation phase. The cost of CD-ROM players has also dropped dramatically and is currently in the $100-$400 ballpark. You pay a price premium for the faster players, and players able to hold several CDs at once.
Typically, a collection of documents will be prepared for a CD-ROM by using a full-text retrieval system. These systems create a database that allows the end user to search for any word appearing in the documents and to retrieve the documents quickly. These systems vary widely in their pricing structure, and typically, some sort of run-time royalty must be paid to the developer of the text retrieval engine.(11) (For a list of vendors of
these systems, see Section Text Retrieval in the Appendix Resources.)
CD-Write Once (CD-WO) technology enables true desktop CD-ROM production, reducing even the initial mastering cost to under $20 (after the cost of the machine). These machines allow the user to write data to the optical disk (but only once) and to take that disk and play it on any CD-ROM drive.
■ Ain Iruul J>.iOU
EYUIJTb r---1----—
.liiqu
--hifj ■ HAflH aikj
lulminlafiaulj urM.i
— rrti —-■ rr^H iY
rr«Jul*lM.liHrkk^U l.iTrU UlI
CD-ROMs have become the preferred mechanism for software and document distribution for one simple reason: it saves money. CD-ROMs are also more convenient from the customer's point of view. Rather then sifting through a shelf full of documentation, an online document browser can be used. Apple, Sun, IBM, DEC, and HP, to name a few, distribute software and software documentation on CD-ROMs. CD-ROMs are THE most costeffective mechanism for distributing electronic information.
In addition, the read-only limitation is actually a valuable feature. The information provider does not have to worry about changes made by the users because they can't make any. CD-ROMs have become the medium of choice for electronic publishing and distribution.
The revolution in digital storage and delivery spawned by the audio CD industry is about to be repeated. A second generation CD, called DVD (digital video disk) or HD-CD-ROM (High Density CD-ROM) is expected to be released in late 1996.(12)
The breakthrough in this technology was the agreement in September of 1995 to a single standard from what was two competing specifications. The two groups Sony/Phillips and Toshiba/Warner, were each proposing incompatible formats. These two groups wisely sought to avoid previously costly format battles such as the VHS versus Beta for video in the early 80's. They eventually created a unified format for the new CDs.
■ DW uL.W
au IhiVH ra JVU
kiaq'U rpHi 1m b ■ au i arf i^ui v At u'
^^'i -ii j nn jd ^tM,'n,4riLa i. l-j ■'il IIV. -Ljl- Twi uiL-Hji ■----
In addition, these groups clearly recognize the value of a single standard in the creation of a market. The CD-audio disc followed by the ISO 9660 standard for data created the CD-ROM industry.
The new disk will (going out on a limb here) replicate much of the phenomenal growth of the audio CD. Video tape will, in several years, disappear as a rental media and be replaced by DVDs. The DVDs will use the MPEG-2 standard to encode high quality video and audio.
Similarly, HD-CD-ROMs will replace regular CD-ROMs. All of this is made possible by the increased storage capacity of the new discs. The preliminary specs of the unified format will be as follows: it will have 2 sides, each side can have 2 layers, each layer can have 4.7GB. Eventually the discs full capacity of 17GB will be reached. Systems are expected to hit the market in 1998 or 1999.
The specification allows for existing audio and CD-ROM discs to be played in the new players. The size will be the same as current audio discs, 120mm. According to a market research firm called InfoTech in Woodstock, Vermont, it is expected that three main applications will arise: linear video (a replacement for video tape playback), multimedia PCs, and interactive TV set-tops (video-game consoles). Probably the first to gain in the market with be for the PC desktop, which has an insatiable appetite for storage capacity and where the initial expense ($500-$800) of the new units can more easily be absorbed.
By 1998, an erasable version called HD-CD-E is expected to open up all sorts of possibilities. Even if this revolution occurs at the same phenomenal rate as audio CDs, it will take at least 5years, so don't get rid of your VCR yet.
7 . 4 . 2 Electronic Page Delivery
The maintenance of page fidelity is one of the most difficult issues standards committees and vendors have tried to solve. The old problem of structure versus style is the major issue. Standards such as SGML and ODA along with document processing systems such as troff and TeX provide strict structural definitions and mappings between structure and the visuals (style). In the last few years, several efforts have met with considerable success.
Page delivery systems attempt to solve the document interchange problem, while maintaining page fidelity. They offer varying features, such as searching and the level and types of editing allowed. Certainly the clear winner in the battle of these systems is Adobe's Acrobat suite of products. Adobe, using a combination of marketing and technical prowess, is attempting to market PDF as the premier page interchange file format.
Adobe's answer to the demands of on-line publishing comes in the form of an updated form of PostScript called the Portable Document Format (PDF). The Acrobat product suite offers users a robust and flexible mechanism for on-line publishing and very convenient toolswith some important restrictions. Simply put, the PDF form of a . document maintains page fidelity _ and allows for costeffective distribution of those pages. A kind of electronic paper. Any word/document processing system that can output PostScript can convert the ' output to PDF. The PDF files can be ' viewed using a freely ' distributable viewer.' ■ You cannot, however, edit the ' content of the PDF document. In ■: addition, not all font problems have been solved.
The folks at Adobe Systems are attempting to create a ■ new "revolution" in document interchange. Adobe's approach has two phases. First, create ■ a technology that allows the document to be transmitted, viewed, and printed. Once this is accomplished and users commonly interchange electronic documents, the assumption is that market forces will drive vendors to use the PDF format as building blocks for new editing applications that understand the semantics of the objects that make up the ' document. ■: ■: ■:
Any document that can be' printed can be converted into , a PDF document. Users will be able to view, print, attach notes,' search ' for words, and create ' links between items of these documents. Adobe's approach is to let the market decide whether successful vendors will create applications that allow complex editing. Only time will tell if this approach will work
Adobe Acrobat Exchange viewer with thumbnails and enlarged section of page.
The PDF format itself has been published by Adobe, and there don't appear to be any
proprietary tricks up their sleeves. The Acrobat viewer is being integrated with Netscape's Web browser as another plug-in. Speaking of the Web, Acrobat's relationship with the Web brings up some interesting issues. The Adobe Acrobat Plug-in is called Amber and can be downloaded from either the Netscape or Adobe Web site.
The hypertext linking capability of Acrobat has been enhanced by Adobe to allow links that are URLs. In the early days of Acrobat/Web integration, you could, in a properly configured Web browser, select a link which would point to a PDF file. Acrobat would launch as a helper application, and that was it. Afterwards, you would be left in Acrobat with no integrated way of getting back to the browser, other than through the window system. Typical helper application stuff. Now, however, you can link to a PDF document and, inside Acrobat, select a link that is really a URL, causing the Web browser to go to the new URL. This presents the user with a powerful browsing capability. It may also be somewhat confusing to the user who now has to cognitivly switch between the user interface of the Web browser and the user interface of Acrobat.
Common Ground is one of the other major contenders in page delivery systems. Acrobat and Common Ground were introduced at about the same time. Common Ground gave away its viewer for free, and Adobe charged about $50 for the Acrobat Reader. This nominal fee was sometimes a problem for educational institutions and other non-profits that wanted to ensure the lowest cost and widest possible distribution of their documents. Eventually, Adobe saw the light and changed its policy; they now give away the viewer. (You can find it at http://www.adobe.com).
A third electronic page delivery system is called Envoy from, folks at WordPerfect. With the turmoil brought on, no doubt, by the sale (twice) of WordPerfect to Corel, it remains a bit player.
Electronic bulletin board systems (BBS) are still a widely used mechanism to distribute electronic documents. Using a modem, users simply dial up to the BBS of interest, poke around, find some files of interest, and start downloadingtransferring a file from a remote system to your local system.
Bulletin boards have become widespread because they are extremely inexpensive. All that's required is an inexpensive PC, a modem, and some BBS software (widely available for free). Any local PC users group can provide more information on bulletin boards.
Perhaps the greatest benefit of bulletin boards is that they can put you in touch with technical experts in virtually any domain. For example, if you were having a nasty technical problem with PageMaker, simply locate a bulletin board with a discussion group
about PageMaker, post your problem, check back in a few days, and maybe someone has posted an answer. The large commercial bulletin boards, like CompuServe, are ideal for this type of interaction because they have discussion groups (forums) on all sorts of topics The commercial bulletin boards are relatively expensive to use but are invaluable for this type of access to expertise. You might also want to hunt down an appropriate USENET newsgroup for technical questions.
As a distribution mechanism, a BBS is just like the electronic equivalent of the physical bulletin board at the supermarket. On both media, you can see ads, information blurbs, requests for help, and items for sale. One limitation, some would say benefit, of this form of communication is its passive nature. People must take the action to dial the BBS and look for information. Electronic mailbased systems can be set up to interrupt and inform you about important new documents or actions that must be taken.
Pta iVhJj^ rtm
HJIII 1IbL^ ilWI-UliMU
iV ■ IHW-. hrrwaij
■ ■AM Hi ■P'a'-'ria- **
4i-^ K- W UlMmiff-tJiUIH-K4IH1 hf-h-mjmmvi h i-pthp l-n-1 -pw*#. 'i h p vi nv v* i rv rv a-^- ■ k 4*' PMM-PH N-h. Im.
14 F*'f pa-h wri* p I
-itaferr-in* Jhha
* "J h ™ ■ ■■
Ena. I'phiii ljrmrm b*Bf^|ri^nir|ii ria
Clearly the days of BBSs are numbered. The Web has certainly overtaken BBS is terms of sheer numbers, however, it's still much simpler to set up a BBS than a Web server. Small organizations with little or no technical support and a well defined community of users would be wise to take a look at BBS technology, it may be just the right solution to their needs.
Electronic Mail (e-mail), that modern staple of office communications, is also an effective way to distribute documents and coordinate a group's activities. Worldwide e-mail service has greatly improved in recent years to the point where many different e-mail networks are interconnected.
There are two main ways of using e-mail as a distribution mechanism. The first is an Electronic Mail Response Program (EMRP), and the second is a mailing list server. Let's examine each of these in turn.
An electronic mail response program, also known as archive servers or mail servers (and some other names), is a clever use of electronic mail. An EMRP is a program that reads mail and responds to requests specified in the mail. Usually, the EMRP can respond only to a limited set of commands. The primary ■ command is usually send. It is usually followed by a file name. The exact syntax of these commands varies with the particular mail server; however, they are all simple to use and functionally very similar.
Virtually all mail servers can interpret the command help (if not, they certainly should) and respond with useful information. Typically, the mail server contains an index that points to directories, individual files, or both. You can request the files.
The second e-mail distribution mechanism is a mailing list manager (MLM). The two principle MLMs are LISTSERV and majordomo. A good comparison of the different MLMs can be found at: ftp://ftp.uu.net/usenet/news.answers/mail/list-admin/software-faq. A mailing list manager is a program that maintains a list of user addresses and lets remote users access and/or send information to these addresses. Mailing list managers allow interested users to participate in discussions about topics of interest. Once you subscribe to a list, you receive all mail sent to the list. One of the more popular MLMs is called LISTSERV. The functionality of LISTSERV is really a superset of an EMRP, as well as a simple mailing list manager. LISTSERV is very robust, with facilities for secure access, packaging of files into related groupings of information, and mailing list maintenance.
From an introductory file describing the Revised LISTSERV:
Although the primary function of LISTSERV is to distribute mail and files to predefined distribution lists, it may often be desired to provide the subscribers of the list with a set of data or program files to be periodically maintained by a particular person or set of persons. Apart from the obvious example of list "notebooks" (archives), working groups might want to provide minutes of internal meetings held by some of the subscribers, technical groups might want to share application programs related to some software they are all using, etc.
It was decided that the most convenient way of meeting these needs was to provide basic, non-specialized fileserver functions along with the mail-processing function of LISTSERV. Those functions would have to provide powerful yet list-based file access control and remote file updating facilities, under the control of both the list owner and the LISTSERV management.
Automatic distribution of updated materials to subscribers was another major concern,
since it makes this distribution more efficient whenever the list is supported by more than one peer server, and relieves the file maintainer of the burden of preparing the list of subscribers. The users request such distribution directly from the server without any intervention from the file maintainer.(13)
E-mail access is probably the widest of all distribution mechanisms. More people can ' access more information via e-mail than with any other electronic mechanism. The downside, however, is slow access speed and the sometimes limited ability to transfer large information files. E-mail is generally a nonrealtime ' process. It usually' takes a few hours, and sometimes days, for e-mail to go from one site to another because of the many gateways involved. Mail systems are typically configured to send mail at some fixed interval, such as once an hour or at night when phone rates are least expensive. Fast, realtime access and the ability to conveniently transfer large files requires a true network.
7 . 4 . 5 Resource Discovery ' Tools
As the Internet becomes larger and more interconnected finding information is becoming one of the most challenging issues. Often one is struck by thoughts such as "I know I saw something about the new netWizard product...but where was it?" An entire discipline has been created called resource discovery.(14) Resource discovery is the formal study of net surfing.
One of the more serious problems, which is only recently being addressed, is finding information on the "net." A combination of funded efforts, clever programming, and better interconnectivity is leading to a tamer Internet. The rest of this section discusses some of these projects.
Some clever people at McGill University in Canada created one of the best early means of locating information on the Internet. They created an electronic mail response program . - called "archie." Archie maintains a database of archive sites and the names of their contents. The command used for searching functions follows:
prog <reg expr1> [<reg exp2>...] ...
in which prog is a keyword that means to find or search. A search of the "archie" database is performed with each <reg exp> (a regular expression) in turn, and any matches found are returned to the requestor. Note that multiple regular expressions may be placed on one line, in which case the results will be mailed back to you in ■ one ' message. If you have multiple "prog" lines, then multiple messages will be returned, one for each line. ■ ■ -
*• 1• *
. Users of archie simply send email requests containing the command prog using the syntax specified above to search for ■ particular programs or documents of interest. ■ Archie email sends a response, telling you where to go to find the ■ item you requested.
A friendlier user interface exists via the program xarchie. An X window system program, xarchie lets you interactively select the database sources and pose queries. Xarchie returns the results of the query in a list with the most relevant items on the top of the list. . . ' .
XARCHIE - the X Window System User Interface to archie(15) ■ .
An archie site runs a program that maintains the database by using anonymous ftp. The . archie program checks the contents of several hundred archive sites over approximately 1 . to 2 months. Lots of sites run ■ archie servers.(16) Some sites allow interactive queries of . the archie database via TELNET,(17) eliminating the delay inherent in email. ■ . ■ ■ '
The Wide Area Information Service (WAIS) _ is an effort to make possible networkwide document retrieval. The project started as a joint effort of Thinking Machines Corporation, Dow Jones News/Retrieval, and Apple Computer. Now WAIS Inc. is a ■ ■ subsidiary of America OnLine (AOL). ' - : -
The WAIS project leader, Brewster Kahle, tells some of the history and functionality of WAIS in this overview written in 1991:
The Wide Area Information Servers system is a set of products supplied by different vendors to help end-users find and retrieve information over networks. Thinking Machines, Apple Computer, and Dow Jones initially implemented such a system for use by business executives. These products are becoming more widely available from various companies.
What does WAIS do?Users on different platforms can access personal, company, and published information from one interface. The information can be anything: text, pictures, voice, or formatted documents. Since a single computer-to-computer protocol is used, information can be stored anywhere on different types of machines. Anyone can use this system since it uses natural language questions to find relevant documents. Relevant documents can be fed back to a server to refine the search. This avoids complicated query languages and vendor specific systems. Successful searches can be automatically run to alert the user when new information becomes available.
How does WAIS work? The servers take a user's question and do their best to find relevant documents. The servers, at this point, do not "understand" the user's English language question, rather they try to find documents that contain those words and phrases and rank them based on heuristics. The user interfaces (clients) talk to the servers using an extension to a standard protocol Z39.50. Using a public standard allows vendors to compete with each other, while bypassing the usual proprietary protocol period that slows development. Thinking Machines is giving away an implementation of this standard to help vendors develop clients and servers.
What WAIS servers exist? Even though the system is very new, there are already several servers:
* Dow Jones is putting a server on their own DowVision network. This server contains the Wall Street Journal, Barons, and 450 magazines. This is a for-pay server.
* Thinking Machines operates a Connection Machine on the internet for free use. The databases it supports are some patents, a collection of molecular biology abstracts, a cookbook, and the CIA World Factbook.
* Weather maps and forecasts are made available by Thinking Machines as a repackaging of existing information.
* The "directory of servers" facility is operated by Thinking Machines so that new servers can be easily registered as either for-pay or for-free servers and users can find out about these services.
How can I find out more about WAIS?
Contact Brewster Kahle for more information on the WAIS project, the Connection Machine WAIS system, or the free Mac, Unix Server, and X Window System interfaces. There is a mailing list that has weekly postings on progress and new releases; to subscribe send an email note to wais-discussion-request@think.com.
Brewster KahleProject Leader Wide Area Information ServersBrewster@Think.com
It is important to note that the communications to the WAIS servers are accomplished using the ANSI standard protocol for database retrieval applications, Z39.50. The decision to use a public standard is what makes this communications method truly open.
Gopher is a widely used method for browsing through documents on the Internet. Originated at the University of Minnesota, Gopher usage has exploded from around 19921995. Usage has tapered off with the explosion of the Web. The appeal of Gopher, however, is that it is textonly, allowing more ubiquitous access to the information.
One way of locating information on the Internet is to use Gopher servers. Gopher servers maintain collections of documents with the additional ability of full-text searching. The Gopher protocol and concept are due to the effort of the people at the Microcomputer and Workstation Networks Research Center at the University of Minnesota. One of the appealing aspects of Gopher is the simplicity with which it presents itself to the user. You, the user, are presented a "file system" just like any hierarchically organized file system, except that this file system covers all information known to the particular Gopher server. It's a simple, elegant, and powerful approach.
Client systems through which a user "speaks" to the Gopher server can have a number of user interfaces. One of the clients on UNIX systems is based on curses and will function on any terminal.
According to Mark McCahill, member of the Gopher development team:
The Internet Gopher is a distributed document delivery service. It allows a neophyte user to access various types of data residing on multiple hosts in a seamless fashion. This is accomplished by presenting the user a hierarchical arrangement of documents, a menu, and by using a client-server communications model. In addition to browsing through hierarchies of documents, Gopher users can submit queries to Gopher search servers. The search servers typically have full-text indexes for a set of Gopher documents; the response to a query is a list of documents that matched the search criteria.
Internet Gopher servers accept simple queries (sent over a TCP connection) and respond by sending the client a document or a list of documents. Since this is a distributed protocol there can be many servers but the client software hides this fact from the user. We currently use this technology at the University of Minnesota to help support microcomputer users... a couple of Gopher servers have 6000-7000 computer Q&A items that users can search for answers to their questions. In addition, there are also Gopher servers with recipes and other fun stuff.
Conceptually, a user might see something like the following illustration.(18)
In the case of an interaction with Gopher servers, however, these directories may exist anywhere on the Internet. Furthermore, the user doesn't really care where the information is, as long as it's accessible in a timely manner.
Client systems through which a user "speaks" to the Gopher server can have a number of user interfaces. One of the clients on UNIX systems is based on curses and will function on any terminal.
MISCELLANEOUS INTERNET SERVICES
Along with the services already discussed, a few dozen more also exist. As the reliability and speed of the Internet increases, these resources are becoming increasingly valuable, especially since the Web integrates them all. Indeed, the spread of this technology raises important questions about the future of academic journals, libraries, and publishing.
Scott Yanoffhas for many years provided lists of various Internet Services, this has now evelved into a valuable Web site, that you can find at: http://www.uwm.edu/Mirror/inet.
Another trend that is taking advantage of the improved connectivity of networks and email is the rise of the electronic journal. An electronic journal is, as the name implies, simply a genuine academic journal that is published and disseminated primarily via electronic media. These media primarily consist of archives accessible via ftp and listservers. Electronic journals represent the maturing of network connectivity.
Given the distributed nature of networks with worldwide geographic coverage and literally thousands of places where information hides, it is a challenge to find information "out on the net." One great place to find electronic journals is the "Directory of Electronic Journals and Newsletters." It was compiled by Michael Stangelove of the University of Ottawa, Canada.(19) A more up-to-date directory is published by the Association of Research Libraries.(20)
Some of the journals are peer reviewed just like their paper counterparts. They cover a wide range of topics, ranging from fine art to issues concerning the handicapped, to library science and ethnomusicology. It appears that the world of electronic networks is finally growing up!
7 . 4 . 7 FAX Boards and Modems
Well, if I called the wrong number, why did you answer the phone? James Thurber
The ubiquitous FAX is no longer limited to hard copy. Special FAX boards can be plugged into PCs and workstations to send and receive FAX documents without ■ ever having to scan or print paper. CCITT Group 4 is the standard for representing FAX documents, and these devices can transmit and receive purely electronic FAX documents. FAX modems are widely available for a little more money than simply a modem. These days, it's hard to find a modem card that doesn't have FAX capabilities. Using this technology, electronic distribution systems can be easily set up. (See Section 6. 4 Standards and Formats in Chapter 6 Media and Document Integration for more information on FAX).
The relationship between FAX software computers and the normal printing functions on the computer is very interesting. For example, on most computers with window systems, FAX modems are usually implemented as a printer device. This means that the FAX appears to the operating system as just another printer. When running an application, you simply print something and the currently selected printer, in this case the FAX modem, receives the output. The printing dialog box presents the user with various options concerning phone dialing, and soon the document is on its way as a FAX. The seamless substitution of a FAX device for a printer is very appealing.
r r h-k fru Ml
r-WJ P-blrtJC Hi Ird|lvf id1UI B.HI 1b3 I rf ■ htfl l-k^h S'ItIIU
UJLF-C-4M I* I* PJC ailMI h#H.u .imw- if™
-HPM+Utf FTE-'-PfV-Hlta p-U
# m i ■!-:«+■+■ ^ i n p i-h
■ TI ™pi-LUJ.I
Another interesting approach is taken by vendors of FAX software for workstations. Often they come with a PostScript interpreter, that takes a PostScript file, images it in the same way that a printer images PostScript information, and then ships the image out via FAX.
As FAX hardware gets ever closer to printing software, the distinctions start to blur. Device drivers treat ■ FAX devices as printers. Page description languages are used to create FAX images. FAX printing and distribution are midway between electronic and traditional document distribution.
Ml* IP IVJ
PWbPjbi HI Uh
U'MtU W I JJC* I I ILH.IL W^i:-#*IUI^-IIH
■ ■ hpw<#ii4^terppi4ii.M i;#pji h hM. Yji
ihi^ ■ ih ^ vi^ri h pnHWH'iri
FT'* h kUU' 41
Another FAX possibility is the integration of a FAX server with a local area network or e-
mail. Some FAX servers can be sent text files or some other welldefined format, such as PostScript. It is quite feasible to set up mail distribution lists, such as those discussed in the earlier section on electronic mail, to send FAXs to additional people who don't have email.
Finally, of course, someone has figured out how to marry the Web with FAX. The folks at Universal Access Inc. have a service called WebFaX which (yes you heard it here first) lets you surf the Web via your FAX machine. You dial their FAX number, spell out the server on the touch tone keypad, and voila, the Web page gets faxed back to you. Actually it's not quite so simple or complete. They only have a few thousand of the more popular Web sites, and it works best if the site has a unique ID assigned by the WebFaX people. The WebFaX system is being packaged up into a more complete communications system. Stay tuned; it's a neat idea with some useful possibilities.
Related to FAX and Web integration, we also have NetPhonic Communications Web-On-Call Voice Browser. That's right; once these folks get their software up and running, you will be able to surf the net with your telephone. So when you are at the airport away from your computer, don't worry, a voice synthesizer can read those pages to you. Actually, it can send email or a FAX of the page, so it might actually be useful. We'll see.
MIT supports a poetry server with a great deal of classical and modern poetry. Cosmic is serving descriptions of government software packages. The Library of Congress has plans to make their catalog available on the protocol.