Chapter 1: World Wide Web

"This 'telephone' has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us." -Western Union internal memo, 1876

1 . 1 Introduction to the World Wide Web

So what's all this fuss about the World Wide Web? What's the big deal? Why should I bother to spend my time looking at all sorts of irrelevant drivel? These questions are a typical response by the non-techie to all the hype of the World Wide Web (also known simply as the Web). In fact, most of the content is drivel, and it takes far too long to get useful information, but the Web is a big deal and it is worth understanding it's implications.

Perhaps the most thoughtful and profound demonstration of the impact of the Web is the recently completed "24 Hours in Cyberspace"(1) project. In that event, 100 photojournalists around the world photographed events; transmitted them to mission control in San Francisco, where editors typed up stories about the events; recorded telephone interviews with the photographers; composed the entire product into Web pages; and built the information into a robust compelling instant publicationall in 24 hours. Viewed by literally millions of people, the site had approximately four million "hits" on that first day.

It was, and still is a compelling storyhow information technology is used in the daily lives of people, around the world. It used the technology to tell the story of how technology can enhance their lives. It was an elegant affair.

Is this journalism, radio, broadcasting, or what? Clearly the integration of all these technologies has created something much greater than the sum of its parts. The ability to assemble and edit information, including images and sounds, and make it available for instant reading/broadcast was phenomenal. The fundamental enabling technological glue and the cause of the Internet explosion is the World Wide Web(2). So what is it?

Let's start in the middle of Web time with Mosaic. Mosaic, a Web browser, was the first "killer" Internet application. Mosaic was introduced to the net in the same way as many other university research projects. It's available for free, and, with source code, for nonprofit use. Like many other applications it is a product of the Internet community, specifically, the National Center for Supercomputing Applications (NCSA). It was the product of yet another unheralded government (National Science Foundation) grant.

To understand the explosive growth of the Web, take a look at the number of Web sites discovered by Matthew Gray of net.Genesis over the past few years.

Growth of the Web

Month

Web

No. of Web

sites % of commercial sites

Hosts per server

6/93

130

1.5

13,000

12/93

623

4.6

3,475

6/94

2,738

13.5

1,095

12/94

10,022

18.3

451

6/95

23,500

31.3

270

1/96

90,000

50.2

100(estimate)

The Web's exponential growth continues. As the Web becomes more widely used, it will start to impact traditional broadcast media like radio and television.

The developers of Mosaic did not try to invent everything. They built on a number of existing standards and systems. Prime among these was the Web developed at CERN, the European Laboratory for Particle Physics. In fact, most of the technological "breakthroughs" were the result of the WWW. The fuss and hoopla that surrounded Mosaic was due to the unified and reasonably pleasant interface it presents to the user.(3)

The Arena Web browser from World Wide Web Organization (W3O)

Mosaic and its commercial clones such as Netscape from Netscape Communications offer end users a view of a compound document with many types of data, images, sounds, video etc. (See Section 3.4.1 Compound Document in Chapter 3 Points of View). Many items in the document contain links to other documents. These hypertext links allow the user to browse an entire collection of related documents easily. The documents are distributed and accessed throughout the Internet via the protocols supported by the Web. The net effect (pun intended) is to be able to read compound documents containing images and sounds with the real information sources distributed over the Internet. Web browsers have become the front end to the Internet.

Several key features make the Web extremely powerful.

•    It sits on top of the Internet's existing infrastructure.

•    The Web protocol unites many different Internet protocols, such as ftp, telnet, gopher, mail, and news.

•    It is based on open systems: therefore it runs on many computing platforms.

•    It is physically and logically distributed, and thus scalable.

•    Web browsers provide a convenient user interface, rich enough to be interesting yet simple enough to promote exploration.

Tim Berners-Lee(4) is the acknowledged "father" of the Web. Originally from CERN he is now at the World Wide Web Organization (W3O). From his overview of the Web comes the following summary:

World Wide Web - Summary

The WWW (World Wide Web) project merges the techniques of networked information and hypertext to make an easy but powerful global information system.

The project represents any information accessible over the network as part of a seamless hypertext information space.

W3 was originally developed to allow information sharing within internationally dispersed teams, and the dissemination of information by support groups. Originally aimed at the High Energy Physics community, it has spread to other areas and attracted much interest in user support, resource discovery and collaborative work areas. It is currently the most advanced information system deployed on the Internet, and embraces within its data model most information in previous networked information systems.

In fact, the web is an architecture which will also embrace any future advances in technology, including new networks, protocols, object types and data formats.

Clients and server for many platforms exist and are under continual development. Much more information about all aspects of the web is available on-line so skip to "Getting started" if you have an internet connection.

Reader view

The WWW world consists of documents, and links. Indexes are special documents which, rather than being read, may be searched. The result of such a search is another ("virtual") document containing links to the documents found. A simple protocol ("HTTP") is used to allow a browser program to request a keyword search by a remote information server.

The web contains documents in many formats. Those documents which are hypertext,

(real or virtual) contain links to other documents, or places within documents. All documents, whether real, virtual or indexes, look similar to the reader and are contained within the same addressing scheme.

To follow a link, a reader clicks with a mouse (or types in a number if he or she has no mouse). To search and index, a reader gives keywords (or other search criteria). These are the only operations necessary to access the entire world of data.

Information provider view

The WWW browsers can access many existing data systems via existing protocols (FTP, NNTP) or via HTTP and a gateway. In this way, the critical mass of data is quickly exceeded, and the increasing use of the system by readers and information suppliers encourage each other.

Providing information is as simple as running the W3 server and pointing it at an existing directory structure. The server automatically generates the a hypertext view of your files to guide the user around.

To personalize it, you can write a few SGML hypertext files to give an even more friendly view. Also, any file available by anonymous FTP, or any internet newsgroup can be immediately linked into the web. The very small start-up effort is designed to allow small contributions. At the other end of the scale, large information providers may provide an HTTP server with full text or keyword indexing. This may allow access to a large existing database without changing the way that database is managed. Such gateways have already been made into Oracle(tm), WAIS, and Digital's VMS/Help systems, to name but a few.

The WWW model gets over the frustrating incompatibilities of data format between suppliers and reader by allowing negotiation of format between a smart browser and a smart server. This should provide a basis for extension into multimedia, and allow those who share application standards to make full use of them across the web.

This summary does not describe the many exciting possibilities opened up by the WWW project, such as efficient document caching. the reduction of redundant out-of-date copies, and the use of knowledge daemons. There is more information in the on-line project documentation, including some background on hypertext and many technical notes.

Getting Started

If you have nothing else but an Internet connection, then telnet to info.cern.ch (no user or password). This very simple interface works with any terminal but in fact gives you access to anything on the web. It starts you at a special beginner's entry point. Use it to find up-to-date information on the WWW client program you need to run on your computer, with details of how to get it. This is the crudest interface to the web do not judge the web by this. Just use it to find the best client for your machine.

You can also find pointers to all documentation, including manuals, tutorials and papers. Tim BL

[SECTION 1.2] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8]|2]

© Prentice-Hall, Inc.

A Simon & Schuster Company Upper Saddle River, New Jersey 07458


Legal Statement

1 . ■ 2 . Browsing the Web

."Surfing the Web," a phrase meaningful a short time ago only to computer geeks, has now entered the popular culture. This is one of the surest signs of the impact of the Web.

■' According to the WWW FAQ (Frequently Asked Questions) maintained by Thomas Boutell:

What are WWW, hypertext and hypermedia?

WWW stands for "World Wide Web." The WWW project, started by CERN (the European Laboratory for Particle Physics), seeks to build a distributed hypermedia system.    .

The advantage of hypertext is that in a hypertext document, if you want more information about a particular subject mentioned, you can usually "just click on it" to read further detail. In fact, documents can be and often are linked to other documents by completely different authors much like footnoting, but you can get the referenced document instantly!

To access the web; you run a browser program. The browser reads documents, and can fetch documents from other sources. Information providers set up hypermedia servers which browsers can get documents from.

The browsers can, in addition, access files by FTP, NNTP (the Internet news protocol), gopher and an ever-increasing range of other methods. On top of these, if the server has search capabilities, the browsers will permit searches of documents and databases.

.The documents that the browsers display are hypertext documents. Hypertext is text with pointers to other text. The browsers let you deal with the pointers in a . transparent way: select the pointer, and you are presented with the text that is pointed to. ■ . ■ . ■ .    .    .    .: .: ■ .

Hypermedia is a superset of hypertextit is any medium with pointers to other media. This means that browsers might not display a text file, but might display images or

■ sound or animations.    .    .    .:    .    .    . .

The compound document a user manipulates is "authored" using the HyperText Markup Language (HTML) which is a specific Document Type Definition (DTD) of ; the Standard Generalized Markup Language (SGML). In short, the WWW designers wisely chose not to invent yet another language technology and instead chose an ■ . ' existing standardized language.

Initially, HTML was designed simply as a convenient way to mark up text. Shortly after its creation however, the folks at CERN got wind of SGML, and the two have been struggling to stay together. HTML and SGML serve different needs and communities. HTML is geared more toward the look of Web pages, and SGML more toward the documents structure, not how it looks. HTML has benefited greatly from the technology provided by SGML. SGML has benefited greatly from the popularity of HTML and the Web. They have a symbiotic relationship.

The developers of Mosaic used the rich foundation of WWW as a starting point. These collaborations are what make an open Internet such a valuable resource.

Web browsers all have the same basic features. They let you jump from link to link. They display some graphics. They have mechanisms to call other applications for specific media types. Web browser vendors are starting to differentiate themselves by introducing new HTML tags and features. Each vendor hopes its feature set is compelling enough to become the defacto standard for authors. This is a dangerous game and bad for the end user, because documents become tied to specific Web browsers which support the new tags. Standardization and conformance testing offer the only hope for this situation.

Navigator/browser feature comparison

Cello

NCSA X NCSA

Netscape

Spyglass

Air

Internetwo

Win Web

v 1

Mosaic Mosaic

(Win)

(Win)

Mosaic

rks

Tapes-

Explorer

V 2.4 (Win)

v 1.0

v 1.02

(Win)

(Win)

try (Win)

(OS/2)

v.20-

v 3.06

Beta 4

v 1.67

v.91

alpha3

COMPLIANCE

proxy

+

+

~

+

+

~

extended html

+


-I-


+

+


+


-I-


/


/


/


/


+/


/


+


+


+


-I-


+


+

PERFORMANCE

multithreading

+

dynamic linking

+

deferred image

+

multi-pane/window    /    /

+ /

CONFIGURABILITY kiosk mode

+

external players

+

INTEGRATION

drag&drop to clipboard


+

+


+


+


spawnable players


+

search engine

+


NAVIGATION

AIDS

+/ / annotation

+

auto time/date stamp

+


h

/


h

+/


b

+/

+/+


h

/

/+


h + / +/+


h

+/


b

+/


+


/


/


/


+


+


+


LEGEND: + indicates that feature is supported in some form

indicates that either the feature is not supported or that we could not get it to function properly ~ indicates that the feature's support was weak by current standards

© 1996 Association for Computing Machinery. Reprinted by permission from "The Client Side of the World Wide Web" by Hal Berghel, CACM Vol. 39, No. 1, Jan. 1996

[SECTION 1.3] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

© Prentice-Hall, Inc.

A Simon & Schuster Company Upper Saddle River, New Jersey 07458 Legal Statement

1 . 3 Web Maintenance

As the Web of HTML documents grows, maintenance of links in the documents becomes increasingly difficult. It is frustrating to select a link only to have the browser return an error message that the document doesn't exist.

New tools are helping manage and maintain Web sites. The Webtest tool suite from EIT (5) is a freely available utility. It contains a Verify Web Link tool. It starts from a URL, traverses outward, subject to a searching profile; and reports the results.

Results of Link Verification test with EIT utility

As the Web matures, vendors are catching up to the demand for Web site management products. One product by Adobe is called SiteMill. SiteMill is a WYSIWYG site manager. It provides users with drag and drop controls and tools to manage links, resource usage, and error handling.

SiteMill's external URL reference list and error controls

SiteMill's visually oriented tools help track down references to external URLs and locate

dangling links. In the Error windows a user can drag the correct file to the missing icon; all references in the site will be updated.

Another product in this new line of Web management software is Interleaf’s CyberLeaf. This system is not an authoring tool, instead it incorporates Web pages authored with whatever tool you like. Integration with the entire enterprise is another feature authoring systems are starting to support. Interleaf uses the term "Web Lifecycle" to describe the process of updating and maintaining a Web. Web authoring systems are introducing templates coupled with tools to help set up the Web site. These are similar in concept to Microsoft Wizards, which lead people through the creation of complex documents. Interleaf’s long history of document processing and management systems, primarily for large organizations, is clearly evident here.

Template usage and link management dialogs from Interleaf's CyberLeaf

Web browsers are applications that run on the user's client machine. The client operating system and particular configuration of the client software and networking all play a role in the operation and behavior of the application, the browser. The availability of ancillary applications and properly configured system-wide protocols contribute to the final document's portability or lack thereof.

One important issue associated with wide distribution of HTML documents results from the Web browser's loose coupling with various applications commonly known as helper applications. Web browsers sometimes launch helper applications when the user encounters an image file(6). The particular application launched is dependent on the data's particular MIME type (see Section 110 6 MIME ); it is often dependent on the extension used for the file name as well. If, for example, the HTML document points to a JPEG formatted image, the client machine must have an application capable of displaying JPEG images and the Web browser must be configured to launch that application upon links to JPEG images. This same scenario applies to sound and video files.

Naming links is another issue related to system dependencies. There is a trade-off, when authoring, in how to name the link. Using absolute URLs (Uniform Resource Locator) is more reliable but much more painful when you have to relocate the Web documents to another directory structure or Web server. If you know that your documents will be moving, you or the authors should be careful to use only links with relative address names. Doing this will make it easy to move the documents to other locations on the same

server.

This becomes important if you think you may want to encapsulate the Web for CD-ROM distribution, an increasingly popular option. Webs of documents can be distributed on CDs with the portion that must be updated obtained from the on-line Web when needed. In this way, the entire hierarchy of HTML files can be moved as a unit without concern for renaming file path names inside the documents. In addition, the relative names often must only be names in directories down from the current location. This is a security ' feature of the server program.

Of course, ' after you author your Web pages you must have them placed onto a Web ' server. Thousands of companies now seem to be willing to host Web pages. They offer virtually any type of service you can imagine, albeit at a price. One particularly intriguing approach offered by AccessAbility Internet Services(7) is the concept of a selfservice Web site. They provide the Web server and host, but you, the author, can do all the maintenance and updates through a controlled process. It's kind of a selfserve copy shop for the 90s.

Self service Web site administration at AccessAbility

[SECTION 1.4] [TABLE OF CONTENTS]

Skip to chapter[1][2][3][4][5][6][7][8][9]

$ © Prentice-Hall, Inc. fa A Simon & Schuster Company

■    ■.    ..    ^ j_ Upper Saddle River, New Jersey 07458 .    .

Legal Statement