3

Describing documents

Introduction

This chapter looks at document representations and document surrogates: records that identify and characterize, and often serve as keys for retrieving the actual documents. You will learn the principles and problems of document representations and surrogates, including:

This chapter is concerned with document representation, not with document access, which is dealt with in Chapters 4, 5, 6, 7, 8 and 9. Metadata is also a form of document representation, but one which is invariably associated with the document itself, and for that reason it was described in Chapter 2.

Characteristics and Problems of Document Representation

In many information retrieval situations, we are unable to work with documents themselves, but have to rely on representations, or document ‘surrogates’. It is only in recent years that it has become technically and economically possible to directly retrieve documents. Previously, we constructed indexes consisting of brief records of documents, and retrieved those surrogates in advance of retrieving the actual documents. Library catalogues, citations, abstracts and bibliographies are typical examples. Hypertext links, classmarks, keywords and abstracts are other forms of surrogate, but are usually embedded in larger records. Identification keys - ISBNs, International Standard Serial Numbers (ISSNs) and URLs - are another type of document surrogate. All these document surrogates represent individual documents in an information retrieval system.

Document representation has a long and complex history, with many distinct strands, including:

  1. Cataloguing: the compilation of lists of books in a collection. In its most basic form, this was an inventory for administrative purposes. It also became a finding list, to enable users to find items in the collection.

  2. Bibliography, historical and descriptive. Historical bibliography is a tool of literary research, by which individual copies of books (particularly handprinted books) are meticulously described, in order to establish the best text of an author. With descriptive (or systematic) bibliography the emphasis is not on the individual copy but on the definitive listing of works having some defining characteristic, often author, place of printing or subject.

  3. Citation: the practice in research and scholarship of writers to list the works they have cited, thereby acknowledging the work of their predecessors.

  4. Indexing and abstracting services, which are used to identify the documents - often journal articles - that are required to meet a specific subject request

  5. Records management systems, which are the responsibility of records managers and archivists to maintain an orderly collection of the records of an organization, in print or electronic format.

  6. Metadata: data added to a networked electronic resource as a mechanism to enable it to be adequately described and located.

Some common and persistent characteristics and problems of document representation are:

  1. Defining the document, as was discussed in Chapter 2.

  2. Identification: any representation of a document or resource must be able to identify it uniquely.

  3. Granularity, or breadth of bibliographic unit. At what level should a document be described and indexed? Should, for example, individual contributions to a journal be indexed separately? Similar questions arise in respect of (for example) papers in a set of conference proceedings; two or more musical works on one CD; a school resource kit containing pupils’ workbooks, wall charts, a teacher’s book, etc.; or pages other than the home page of a Web site. Some considerations influencing the granularity of description include:

    1. Is the more detailed description available within the document? For example, many books have contents pages and back-of-book indexes, or large Web sites may have an internal search engine.

    2. Are there external sources that perform the same task? Most libraries for example do not record journal articles or contributions to published conference proceedings in their catalogues, as there are specialist indexing and abstracting services that perform this function.

    3. Will the use justify the extra cost? Sometimes detailed description for rapid retrieval is vital, as with a television news service’s film archive, or a fire service’s database of information on hazardous materials. In other cases it may be cheaper to carry out time-consuming sequential searching for seldom required items, e.g. in archives of local history materials.

    4. Will the extra detail merely lead to added complexity and near duplication? - as with some Web search engines when they record numerous hits, mostly to different parts of the same site.

  4. Selection. The description must always uniquely identify the document it represents. Many representations go further, by indicating related documents, or by including information which characterizes the document. An important function of document representation is to act as a selection filter, enabling users to decide whether or not they wish to obtain the actual document.

  5. Search keys. Document representations for manually searched databases need search keys: headings under which they are filed for manual searching. In the case of networked resources, ‘semantic interoperability’ requires such devices as metadata and the Z39.50 search and retrieve protocol.

  6. Location and accessibility. A document representation loses much of its purpose if users are not given enough information to enable them to locate the document itself. In the case of books and journals, bibliographic control is well enough established for the conventional publication details to suffice for this purpose. In the case of non-print materials:

    1. Bibliographic control is often less well organized than with printed materials

    2. The extent of the item may not be immediately obvious in the way that it can be judged by counting or estimating the number of pages in a book or journal article

    3. In many cases the material can only be used via some mechanical, optical or electronic device.

    Networked resources may have specific hardware (e.g. free disc space) and/or software requirements (e.g. Adobe Acrobat). Locations may well be remote, unlike library catalogues, which typically list only locations within the institution. A networked resource may be available to all, or access may be restricted, e.g. to members of an organization or on payment of a fee.

Records

The representation of an item comprises a record. A record contains the information relating to and describing one document Other similar documents will also be represented by records. A database is a collection of similar records. Records are composed of a number of fields. The types of fields used, their length and the number of fields in a record must be chosen in accordance with a specific application.

There are two types of field: fixed length and variable length. A fixed-length field is one that contains the same number of characters in each record. Since field lengths are predictable, it is not necessary to signal to the computer where each field begins and ends. Fixed-length fields are economical to store and records using fixed-length fields are quick and easy to code. However, fixed-length fields may not adequately accommodate variable-length data. Fixed-length fields are ideal for codes, such as ISBNs, reader codes, product codes, bank account numbers, dates, and language codes where the length of the information will be the same in each record. With variable-length data variable-length fields are necessary. A variable-length field will consist of different lengths in different records. Here, the computer cannot recognize when one field ends and another starts, so it becomes necessary to flag the beginning and end of fields. In addition objects, such as pictures and video clips, may be stored as separate files linked to records that contain primarily fixed or variable length fields. Within fields, individual data elements or units of information may be designated as subfields. Subfields need to be flagged so that they can be identified. The discussion of the MARC record format, in the section on pages 85-9, has examples of the two types of field and subfields.

Citations

We start with citations, as these have an honourable and familiar place in the academic world: the mechanism by which scholars acknowledge the work of their predecessors, and students ward off accusations of plagiarism in essays. Citations are a relatively uncomplicated form of document representation and one that every student, irrespective of discipline, is required to create. This section discusses the format of citations, and when to make them.

Citations form the basis for structuring the records used in indexing and abstracting services. The purpose and making of abstracts are described in the following section, and the records used in indexing and abstracting services are described in ‘Bibliographic record formats’ later in this chapter. Citation indexes, a method of retrieval based on the lists of references at the end of scholarly papers, are discussed in Chapter 2.

There are a number of published standards for ensuring uniformity, notably British Standards 5605:1990 Recommendations for Citing and Referencing Published Material and 1629:1989 Recommendations for References to Published Material; the Chicago Manual of Style; and K. L. Turabian’s. 4 Manual for Writers of Term Papers, Theses and Dissertations. In spite of the existence of standards, publishers of primary journals continue to maintain a wide range of house styles for citations and bibliographies.

For each reference it is essential to record sufficient information to identify precisely the source cited. There are a number of published standards for ensuring this, but there is at present no standard for citations to electronic documents, though one is in preparation. Two separate methods of referencing documents are permitted: the Harvard (name and date) system, and the Numeric (Vancouver) system (Figures 3.1 and 3.2). The Harvard system is generally easier to apply, and is used in the majority of scholarly journals in the natural and social sciences. The Numeric system is more likely to be found in the arts and humanities.

Images

Figure 3.1 Citation: The Harvard (name and date) system

Images

Figure 3.2 Citation: The Numeric (Vancouver) system

Compiling Citation Lists

Obtain data for citations from the following sources, in order of preference: (1) the title page, or a substitute (cover, caption, masthead, etc.); (2) any other source which is part of the item; (3) any other source which accompanies the item and was issued by the publisher (e.g. a container, a printed insert).

The citation elements are:

  1. Primary responsibility (author, editor, etc.).

  2. Year (the position of this element varies according to the display style chosen. This is the position if using the Harvard style).

  3. Title.

  4. Type of medium (if needed).

  5. Publication details: place, publisher.

  6. Year (the position of this element varies according to the display style chosen. This is the position if using the Numeric style.).

  7. Series - normally only needed for reports, or where there might be confusion between the title proper and the series title (as with some kinds of audio and visual material).

  8. Numeration within the item (if the item is in more than one part, or if part of an item is cited).

  9. Location of item (if unique, rare or otherwise difficult to locate).

Non-Book Materials

Non-book materials, including computer software that can be handled like any other library material, e.g. CD-ROMs. The information given for printed media will need to be supplemented by as many of the following as are appropriate:

  • the medium (e.g. filmstrip, video, compact disc (CD), CD-ROM)

  • how accessed. With audio and visual materials, this information need in many cases only be given if a non-standard system is required (e.g. for Betamax videos)

  • duration of films, videos, etc., if easily established

  • frequency of update (e.g. for CD-ROM databases)

  • Most materials that comprise a single intellectual unit can be treated broadly as books. Citations for databases in CD-ROM formats are based on citations of whole serials.

Electronic Documents

Electronic documents, including electronic monographs, databases and computer programs, electronic serials, electronic bulletin boards, Web documents, and e-mail. As they exist only in electronic format, it is vital to show how the item can be accessed. In the case of Internet resources, the ‘generic’ location description is the URL. References are cited in the text in the usual way. If using the Harvard style, the list of references then follows the general form:

Author (year, date). Title (version) [medium]. Location. Place of publication: Publisher.

  • (Year, Date). Adding Date (month, day, even time of day) to the scheme overcomes the problem of transient or dynamically updated sources.

  • Version is the online equivalent of edition. The Date may be optional if a particular version number is identified.

  • [Medium] will be given as [Online] in the case of sources referenced over a telecommunications link. For non-networked formats use [CD-ROM], [Laserdisc], [Videodisc], [Disc], etc., as appropriate.

  • Page numbers are not usually a feature of electronic documents, as page layout is dependent on the viewing method.

  • Location refers to the URL in the case of Internet resources; otherwise a generic online location. ISO 690-2 (ISO, 1999) recommends the style: Available from source: <location>, e.g. Available from World Wide Web: <http://www.nlc-bnc.ca/iso/tc46sc9/standard/690-2e.htm>.

  • On the Internet, anyone can be a publisher, and Publisher may be omitted where the author has self-published. However, electronic publishing online is becoming an industry in itself, and the position of publisher in the scheme needs to be retained.

Citation Practices

The following rules of thumb are offered as a general guide to good citation practice for students:

  • Always acknowledge your sources.

  • Keep a record of the bibliographical details of those sources as you consult them.

  • If you have a significant number of sources, maintain a card or simple electronic listing of the sources you have used.

  • Record all bibliographical details as indicated in this book.

  • Be particularly attentive in recording details of electronic sources such as home pages - these can be surprisingly difficult to locate on subsequent occasions without the URL.

  • Arrange your bibliography in alphabetical order by the author’s surname or family name, irrespective of the form of document.

  • Any items you have read, but not cited, may optionally be added under a heading such as ‘Other sources’.

Many universities and departments have their own guidelines. Follow any local guidelines that may be issued.

Personal Bibliographic File Management Software

A number of bibliographic file management programs are available for managing personal file collections such as those developed by researchers and other authors. Examples include Biblio-Link, End-Note Plus, File Maker Pro, Papyrus, Pro-Cite and Reference Manager. These packages are not primarily intended for information professionals, so it is important that they should be easy to use. Other desirable features include: predefined fields, the ability to import citations from external sources, so that online search results can be downloaded directly into the correct fields; adding extra fields for personal use, such as keywords and annotations; field or term search capability; detecting duplicates; editing globally; and predefined output formats for generating bibliographies in a variety of journal formats.

Notice that, while ‘bibliographic’ is properly confined to databases containing surrogate records of books, the word is also used more loosely in relation to surrogate records of all kinds of print materials and their databases.

Abstracts

Abstracts are used by both readers of the primary literature and by users of secondary services. Within the primary literature, an abstract normally appears at the front of the item, usually immediately preceding the text In this way, readers are able

to identify the basic content of a document quickly and accurately, to determine its relevance to their interests, and thus to decide whether they need to read the document in its entirety. If the document is of fringe interest, reading the abstract may make it unnecessary to read the whole document

(ISO 214: 1976E)

Abstracts are recommended to accompany journal articles; also any other material in journals that has a substantial technical or scholarly content (e.g. discussions and reviews). It is normal for the writers of journal articles and similar primary material to include an abstract when submitting material for publication. Abstracts should also accompany reports (whether published or unpublished) and theses, monographs and conference proceedings (including chapter abstracts if each chapter covers different topics), and patents applications and specifications.

The other major use of abstracts is in secondary services (e.g. abstracting journals and their associated online and CD-ROM bibliographical databases). These services often use the original (author’s) abstracts - either as they stand, or amended. Where these are lacking or considered unsuitable, an abstract has to be written from scratch, adding to the cost and often reducing the currency of the service.

There are many types of abstract (see Figure 3.3), according to the requirements of particular applications, as influenced by the language, length and readership of the document; the intended audience of the abstract; and the resources of the abstracting agency. For texts describing experimental work and documents devoted to a single theme, an informative abstract is recommended. This type of abstract presents as much as possible of the quantitative and/or qualitative information contained in the document. This includes in particular a note of the results and conclusions of any experimental work. Such an abstract can be a substitute for the full document when only a superficial knowledge is required. Informative abstracts can extend to 500 words or more, though 100-250 words is the norm.

Images

Figure 3.3 Examples of different types of abstract

An indicative abstract is usually much shorter: merely an indication of the type of document, the principal subjects covered, and the way the facts are treated. This type of abstract is often applied to opinion papers and papers generally which do not report research, or where the text is discursive or lengthy, such as broad overviews, review papers and entire monographs. A short abstract comprises only one or two sentences supplementing the title, and may be valuable in current awareness services where speed is essential.

In an indicative-informative abstract the primary elements of the document are written in an informative way, while the less significant aspects have indicative statements only.

A slanted abstract is one which concentrates on those topics within a document that are of interest to the abstracting service’s user community. A development of this is the critical abstract: one that evaluates the abstracted item. Both types are expensive to produce, as the abstractor requires detailed knowledge of the subject and the user community as well as abstracting skills; so they are uncommon, and very seldom found in published abstracting services.

The location of abstracts is:

The skills needed in an abstractor are essentially:

The tendency is for as much use as possible to be made of author-abstractors. Secondary abstractors often have formal qualifications in both information science and the subject field in which they are working. The larger abstracting services employ full-time abstractors, but many abstractors combine these tasks with other information-related duties - particularly information officers in industry and commerce, producing in-house information bulletins.

Writing Informative Abstracts

  • Most documents describing experimental work conform to the sequence Purpose - Methodology - Results - Conclusions. Readers in many disciplines are accustomed to this pattern.

  • Begin the abstract with a topic sentence that is a central statement of the document’s major theme, unless this is already well stated in the document’s title or can be derived from the remainder of the abstract.

  • Give only a brief statement of methodology, unless a technique is new. Results and conclusions however should be clearly presented.

  • If the findings are too numerous for all to be included, give the most important Any findings or information incidental to the main purpose of the document but of value outside its main subject area may be included, so long as their relative importance is not exaggerated.

  • Abstracts must be self-contained and retain the basic information and tone of the document. They must be clear and concise, and must not include information or claims not contained in the document itself.

  • Unless the abstract is a long one, write it as a single paragraph. Write in complete sentences, and use transitional words and phrases for coherence.

  • Use verbs in the active voice and third person whenever possible. Use significant words from the text. Avoid unfamiliar terms, acronyms, abbreviations or symbols, or define them the first time they occur.

  • Include short tables, equations, structural formulas and diagrams only when necessary for brevity and clarity and when no acceptable alternative exists.

Other Kinds of Document Summaries

An annotation usually appears as a note after the bibliographic citation of a document. It is a brief comment or explanation about a document or its contents, or even a very brief description.

An extract comprises one or more portions of a document selected to represent the whole - often a sentence or two indicating the results, conclusions or recommendations of a study. They are usually shorter than an abstract, and require less effort to produce.

A summary is a brief restatement of a document’s salient findings and conclusions. It occurs within a document, usually at the end, less frequently at the beginning. Summaries are most often found in reports, where they are mainly intended for busy people who do not have time to do more than skim through the full text; and increasingly in the chapters of textbooks as an aid to orientation.

Other forms of text reduction, such as reviews, synopses, abridgements, digests, précis and paraphrases, have applications that are outside the scope of the present work.

Record Formats in Abstracting and Indexing Services

For the large public databases, there has been little pressure to accept a standard format, and each database producer has in general chosen a record format to suit the particular database. The nearest applicable standard is the UNISIST Reference Manual (UNESCO, 1986). Even one database may emerge in different record formats according to the online search service on which it is mounted. Individual decisions are made concerning the fields to be included and the subject indexing made available. Yet another variable factor is the presence of full-text, and more recently multimedia, databases which demand a somewhat different record format from bibliographic records if the information is to be appropriately displayed.

Some examples of record formats in online search services are given in Figures 3.4 and 3.5.

Bibliographic Record Formats

All records in one file have a standard format. In order to facilitate exchange of records between different computer systems, there have been attempts to develop some standard record formats. Such formats were seen to be particularly beneficial in cataloguing applications, where a standard format, which also embodies an agreement on the elements of a bibliographic record, has been particularly attractive in allowing the exchange of cataloguing records. This exchange has minimized the need for local cataloguing, as libraries can make use of records that others have created. Accordingly, one of the fields in which a standard record format is best established is in the creation of cataloguing records.

Images

Figure 3.4 Citation with abstract

Images

Figure 3.5 Full-text record

The exchange of machine-readable records has necessitated the standardization of bibliographic record formats. There is an International Standard Bibliographic Description (ISBD) for most categories of material. These include ISBD(M) monographs, ISBD(S) serials, ISBD(PM) printed music, ISBD (CM) cartographic materials, and ISBD(CF) computer files - this is not by any means a complete list. All follow the general framework of ISBD(G), which recommends:

The programme of ISBDs has also brought about the reconciliation of two earlier sets of standards for bibliographical description: AACR, originally published in 1967 and extensively revised in 1978 (AACR2), and MARC, first implemented in 1968. Machine-Readable Cataloguing has proliferated into a range of formats. The UKMARC format is standard in Britain, but is being reconciled with the US format USMARC, which is now recognized as the standard format for the English-speaking world.

AACR2 is organized into two parts, the one entitled ‘Description’ and the other ‘Headings, uniform titles and references’. These indicate AACR2’s two distinct functions of document representation and document access. According to the plan of the present book, only the first of these is under detailed consideration in the present chapter. The MARC format however also incorporates AACR2’s mechanisms for document access (which are discussed in Chapter 10). To further complicate matters, AACR2 excludes subject access whereas the MARC format makes provision for it. Three subject access systems are included in the USMARC record format: the Dewey Decimal Classification (DDC), the Library of Congress Classification (LCC), and Library of Congress Subject Headings (LCSH). These are described in Chapters 7 and 8.

Bibliographic Description

The description of a document as part of a catalogue entry acts as a document surrogate. The word ‘bibliographie’ denotes the large degree of overlap between catalogues and bibliographies. The catalogues of major national libraries are often effectively major bibliographies in their own right, and the libraries themselves may be national agencies for preparing catalogue copy for distribution to subscribers.

The traditional functions of description are to:

  • describe each document as a document - that is, to identify it

  • distinguish it from other items

  • show relationships with other items.

Again, notice that considerations of document access are excluded. In preparing the description of a document it is necessary to make certain preliminary decisions if different cataloguers are to produce identical records from the same document. These considerations include:

  1. The source of the information for the description. A ‘chief source of information’ is designated, to ensure consistency among different cataloguers (for example, in the treatment of books whose title-page title differs from that found on the cover or spine). In order of preference, information is taken from: the item itself; its container; other accompanying material; other external sources. According to the material, a source of information may be unitary (a title page) or collective (the sequence of credits on a film or video). Specific sources of information are prescribed for different parts (areas) of the description. So for books, the prescribed source of information for the title and statement of responsibility is the title page, but for the physical description the whole publication is examined.

  2. Organization of the description. The description is organized into eight areas, based on the layout of a catalogue entry as it has evolved over a century and a half. The areas are:

    1. Title and statement of responsibility

    2. Edition

    3. Material (or type of publication) specific details

    4. Physical description

    5. Series

    6. Note

    7. Standard number and terms of availability.

    The sequence of the areas is as shown. If an area is not applicable to an item, it is simply omitted. The areas are described in detail below. The organization of the MARC record format follows this sequence.

  3. Punctuation. Consistent punctuation aids the recognition and rapid scanning of the various areas of the description in manually searched indexes, and is particularly important for the international exchange of records. In MARC records, the prescribed punctuation for each area of the description is built into the subfield structure.

  4. Levels of detail in the description. Different applications may demand different degrees of detail in the description. AACR2 identifies three levels of detail (see Figures 3.6 and 3.7). In a small general library simple records may be adequate, whereas a large research collection may require rather more detail. National bibliographic agencies may apply different levels of description to different categories of material, with, for example, fiction and books for children being catalogued at the simplest level.

Images

Figure 3.6 Descriptive cataloguing examples: Level 2

Images

Figure 3.7 Descriptive cataloguing examples: Level 1

Components of the Description

A bibliographic description compiled in accordance with ISBD and AACR2 is divided into a number of areas. The MARC bibliographic record format has corresponding groups of fields. These areas are discussed below.

Title and statement of responsibility

These form one area instead of two because one or the other may be lacking (as with anonymous works, or a book of reproductions of art works which has only the artists name on the title page), or the two may be grammatically inseparable (e.g. Poems of William Wordsworth). In many cases, Author + Title proper + Date adequately identify an item, fulfilling the first purpose of description. Early cataloguers, who had to physically type or write each catalogue card, soon realized that time and card space could be saved by omitting the author’s name from the description where it was recognizably the same as the author heading appearing immediately above the description (that is, in the great majority of cases), and this interdependence of description and heading has been built into the MARC record format.

The following elements are distinguished:

  • title proper; this is transcribed exactly as found

  • optionally, a general material designation: a word or short phrase from a prescribed list, indicating the type of material (e.g. [text], [music])

  • parallel title: where the title appears in more than one language

  • other title information: usually a subtitle

  • statements of responsibility.

Optionally, a uniform title - a cataloguer’s filing title, preceding the title proper - may be assigned in cases where different editions of the same work may appear under different titles (see Chapter 10).

The statement of responsibility is given as found within the chief source of information. Its purpose is to describe; it is not intended to serve as an access point. Access points use headings derived from the statement of responsibility according to a complex set of rules. These are described in Chapter 10. A heading is often permanently associated with a description, and in such cases Level 1 description permits the omission of the statement of responsibility when it is recognizably the same as the main entry heading.

The principles of description were laid down before online access became an everyday reality. Title keywords are now a significant access mechanism in OPACs and other computerized search systems. Subtitles are not required in a Level 1 description, but in view of their usefulness in keyword access it would be sensible to include them even in the most abbreviated formats.

Edition

The principal elements are:

  • the edition statement as found in the document, except that abbreviations (e.g. Rev. ed.) may be used, and a statement of first edition is by convention omitted

  • statements of responsibility relating to the edition.

The concept of the edition derives from the printed book in the days when type was set by hand. A new edition implied a resetting of the type, which usually implied some revision of the content. A valid distinction could thus be made between an edition and a reprint, a new printing from a photographic or mechanical copy of the text, with no change in the content Today, many kinds of documents are produced from a machine-held file that can be updated instantly. For books, the idea of the edition still has some validity but, in general, edition statements are becoming more difficult to apply.

Material (or type of publication) specific area

This is used only to record:

  • the scale and projection of maps and atlases

  • optionally, the type of presentation of a piece of music, e.g. Miniature score

  • computer file characteristics (e.g. number of records)

  • the volume and part numbers and dates of issue of a serial.

Publication, distribution etc., area

The principal elements are:

  • place of publication

  • name of publisher, distributor etc.

  • year of publication.

If the description is to be used for current bibliography - that is, as a selection tool - this area should give enough information to identify and locate the source from which the item may be obtained. For books, the publisher and distributor are usually one and the same, and directories of publishers are readily available, making it unnecessary to give more than the publisher’s name for trade pub-Ushers. In other cases the full postal address may be required. Year of publication refers to the edition, and so if an item was published in 1894 and a library’s copy is of a reprint dated 1912, AACR2 still regards its date as being 1894 - a view to which an antiquarian bookseller might not subscribe.

Physical description area

For books at any rate this is an uncomfortable area, in that the detail prescribed is more than is required for the general library purpose of conveying some idea of the extent of an item, but insufficient for the requirements of descriptive bibliography. For audio-visual and electronic materials on the other hand, this is a very important area, as these materials cannot be browsed like a book, so it is important to indicate accurately not only the extent of the item, but also what kind of equipment may be needed to use it. The principal elements are:

  • extent of item, e.g. number of volumes or items in the bibliographic unit, pagination

  • other physical details, e.g. illustrations

  • dimensions.

Series area

The principal elements are:

  • title proper of series; in many cases this is all that is required

  • statement of responsibility only if necessary to identify the series

  • numbering within the series.

The series statement helps to identify an item and to characterise it by giving some idea of its status and subject. Series can be problematical, in that it is not always easy to distinguish title proper from series title when both appear on the chief source of information. So one work might be catalogued as The Buildings of England: Suffolk, and another as Essex. -… (The Buildings of England). On many search systems a title or title keyword search will also retrieve series titles.

Notes area

Notes contain information considered necessary to fulfil the purposes of description but which cannot conveniently be given in one of the earlier, more formal, areas of the description. Notes may be taken from any available source: for all previous areas any information not derived directly from the item’s prescribed chief source of information must be enclosed within square brackets. These are some of the commoner categories of notes:

  • notes citing other editions and works

  • notes describing the nature, scope, or artistic form of the work; or a list or summary of its contents

  • notes expanding on the information given in the formal description

  • notes on the particular copy being described, or on a library’s holdings, or restrictions in its use.

Standard number and terms of availability area

In practical terms this means ISBN for books and ISSN for serials, as there are at present no standard numbering systems for other types of material. Terms of availability is an optional addition, serving the needs of current national bibliography rather than of library catalogues. Normally the price is shown. A standard number provides a check on an item’s identity, provided one bears in mind some of the limitations of ISBNs in that different editions of a work will normally carry the same ISBN or, conversely, the same work may bear two or more ISBNs if it comes in hardback and paperback formats or is published jointly by two or more publishers.

Descriptive Cataloguing Checklist

The full AACR2 provides for three levels of description; the Concise AACR2 approximates to Level 2 description. In the full AACR2, chapter 1 gives rules for describing materials generally, and these are expanded in chapters 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12 for specific types of material, with the rules numbered in parallel with chapter 1.

UKMARC records currently use either Level 2 or Level 1 descriptions. Only the more common patterns are shown here: this checklist does not attempt to cover every eventuality. Level 3 description is identical to Level 2 for the majority of items. Only rarely will items be found where Level 3 prescribes more detail. Examples include second and subsequent place of publication and/or publisher, and parallel series title and/or other series title information.

Sources of information for the various areas (fields) of a description are rigidly prescribed. Any information that is taken from outside a prescribed source is enclosed within square brackets. The sources given below apply to books: consult AACR2 for other materials.

The areas (fields) follow an invariable sequence, as do the subfields within them. Fields or subfields not needed to describe a given item are simply omitted. Punctuation conforms to a rigid pattern (and, in UKMARC records, is generated by the system); for the most part it introduces the information that Mows:

  • Overall layout (1.0D; Concise 1C, ID)

    Title and statement of responsibility. - Edition. - Material (etc.) specific details. - Publication etc. - Physical description. - Series. - Note. - Standard number

  • Tide and statement of responsibility area (1.1; Concise 2)

    Source: Title page.

    Level 2: Title proper: other title information / first statement of responsibility; each subsequent statement of responsibility.

    Level 1: Title proper / first statement of responsibility only if different from main entry heading.

  • Edition area (1.2; Concise 3)

    Source: Title page, or a formal statement made by the publisher elsewhere in the item.

    Level 1: .-Edition

    Level 2: .-Edition / statement of responsibility for the edition

  • Material (or type of publication) specific area. (1.3; Concise 4)

    Used as described above. Consult AACR2 for full instructions.

  • Publication, distribution etc area (1.4; Concise 5)

    Source: Title page, or a formal statement made by the publisher elsewhere in the item.

    Level 1: .- First named publisher, year of edition

    Level 2: .- City of publication: first named publisher, year of edition

  • Physical description area (1.5; Concise 6)

    Source: Anywhere in the item.

    Level 1: .- Extent of item

    Level 2: .- Extent of item: other physical details; dimensions. For books this mostly means:

    . - Last numbered page: illustrations; height in cm.

  • Series area (1.6; Concise 7)

    Source: Anywhere in the item.

    Level 1: Not required at Level 1.

    Level 2: .- (Title proper of series / statement of responsibility only if necessary to identify the series; numbering within the series).

  • Notes area (1.7; Concise 8)

    Source: Any available source.

    Level 1/2: .- Note. - Note (Repeat as needed)

  • Standard number (1.8; Concise 9)

    Source: Any suitable source.

    Level 1/2: .- ISBN 0-12345678-9

Special Problems with Non-Book Media

It was common in the past for libraries to maintain two catalogues, one for books and one for non-book media, or even separate sequences for each type of media. Practice today has moved towards integration, which AACR2 is designed to facilitate. Non-book media still present their own special problems, however. Problems of description or representation are on the whole are more acute than problems of access. The following list identifies a number of recurring problems, which vary from material to material.

  • Many media cannot be browsed like a book. Equipment may be needed to view or play the item, which can be slow and calls for special expertise in the media and its equipment. The description may need to include a summary of the contents.

  • Granularity can be difficult to establish, and a fine judgement is needed in deciding on the level at which to describe a composite item. Typical examples include a school teaching pack containing a range of separate items including audio-visual and printed, or a music CD with works by different composers or players on different tracks. Another aspect of this problem is that some materials are distributed in a complex pattern of series and subseries.

  • Responsibility for the creation, production and distribution of non-book media can be complex and diffuse.

  • There is fàr less standardization of presentation than with monographic materials. Titles may be difficult to establish, and other necessary information, for example date, may be difficult or impossible to establish. External sources of information - for example bibliographies and distributors’ lists -may have to be consulted.

  • There is no standard numbering system, and consequently less likelihood of finding a centrally produced record. Local indexing or abstracting is more likely to be required.

  • Descriptions can be complex, with much necessary information that may not fit easily into the formal areas of description. Extensive notes are therefore often needed.

The following notes briefly characterize some of the problems specific to categories of media. The categories are those used by AACR2.

Cartographic materials

This category includes any representation of the whole or part of the earth or any celestial body, and extends to aerial photographs and three-dimensional maps and plans. These have a Mathematical data area, corresponding to the Material (or type of publication) specific area, to record scale and projection. Some examples are:

  • Scale 1:50,000.

  • Scale 1:23,000,000; Azimuthal equal-area proj.

  • Not drawn to scale.

The physical description merits special attention. This will include: the number of physical units of an item, e.g. 1 map on 4 sheets; other physical details, to include: the number of maps in an atlas, the use of colour, the material from which it is made, if significant, and any mounting: e.g. 1 globe : col., wood, on brass stand. Finally, the dimensions of maps often mean height x width, e.g. 1 map: col., 35 x 50 cm.

Manuscripts

AACR2 gives a chapter to these, but as a manuscript is by definition unique and often of incalculable value, handling them demands specialist training, which places them outside the scope of the present work.

Music

This covers published music only; books about music and musicians are treated like any other book. Sound recordings are treated separately, but share some of the problems of music scores. Problems are more of access than of representation, for two reasons: (1) music publishing is highly international, and the music cataloguer is likely to be handling a disproportionate number of foreign language documents; and (2) classical music in particular may require a uniform title - see Chapter 10 for a brief description of these.

Sound recordings

Sound recordings share with printed music many problems of description and access. Physical description is a particular problem, because of the wide and changing range of formats. AACR2 has detailed instructions on such matters as: playing time; type of recording (analogue, digital, optical, magnetic etc.); the playing speed of discs and film; whether mono, stereo, or quad; and so on.

Motion pictures and video recordings

These even more than sound recordings are subject to rapid technological change. The physical description is along the same lines as sound recordings.

Graphic materials

AACR2 defines 20 types of graphic materials, from activity cards to wall charts, and taking in such categories as art reproductions, filmstrips, radiographs, and slides. Sources of information may be incomplete, and even titles may need to be supplied from external sources, or made up by the cataloguer. As always, the rules for physical description should be studied carefully; many of them are highly medium-specific.

Computer files

The AACR2 has incorporated recent extensions to these rules. The rules cover data and program files. They do not cover programs residing in the permanent memory of a computer, or firmware, or electronic devices such as calculators and little furry animals that die noisily if not looked after. (These are three-dimensional artefacts.) The chief source of information is the title screen, which means that the file has to be run in order to be catalogued. The edition statement is likely to use such words as version or release. Computer files require a Material (or type of publication) specific area, here called the File characteristics area. It indicates whether the file is a data or a program file, together with some indication of its extent. The Notes area provides for information on (among other things) the intended audience, and the need to provide a summary of the purpose and content of the item, together with a contents list and the nature and scope of the file. Other notes cover the system requirements; and the mode of access for files held remotely. This last implies the Internet, as some libraries are now incorporating selected Internet resources into their catalogues.

Three-dimensional artefacts and realia

Nine types are listed generically in the physical description area (art original, art reproduction, Braille cassette, diorama, exhibit, game, microscope slide, mock-up, model); otherwise the cataloguer has to state the specific name of the item. To this is added information on the extent of the item, its material, colour, dimensions and any accompanying material. The items are tangible enough, but chief sources of information may be lacking.

Microforms

Microforms are often reissues of ordinary full-sized materials. This gives two possibilities: to catalogue them as material in their own right (AACR2’s implied preference), or to prepare a description based on the original, with a note indicating a microform reproduction, as is the Library of Congress’s practice in USMARC records.

Serials

Serials differ from other forms of publication in that publication is intended to be continued indefinitely. If the title proper of a serial changes, a new description is made using the new title. Statements of responsibility tend to involve corporate bodies rather than personal authors, and exclude editors of serials. Serials have a Material (or type of publication) specific area, to record the chronological or numeric designation of the first issue. Here, as in the date and the ‘extent of item’ part of the physical description, an ‘open’ entry is normally made. A note records the frequency. The following example shows the general pattern:

Jewellery international. - Oct/Nov. 1991- .- London:

Jewellery Research and Publishing, 1991- .- v.: ill. -

Six issues yearly. - ISSN 0961-4559

The bibliographic recording of serials is the responsibility of agencies reporting to the International Serials Data System (ISDS) in Paris. The British ISDS centre is the British library’s National Serials Data Centre (NSDC); the American centre is known as CONSER, having developed out of a Conversion of Serials project begun in 1973. Among their bibliographic control functions, centres assign ISSNs.

The MARC Record Format

The MARC record format was designed in the late 1960s as a standard format for representing bibliographic information, so that libraries could store, communicate and reformat bibliographic information in machine-readable form. It was first implemented in the USA by the library of Congress in 1968 and in Great Britain by the British National Bibliography in 1971. The format was to be hospitable to all kinds of library materials, and is flexible enough to be used in a variety of applications not only in libraries and bibliographic agencies, but within the book industry and the information community at large. As more countries exploited MARC, variations in practices spawned deviations from the original format The UNIMARC format was developed for international exchange of MARC records. National organizations creating MARC records have used national standards within the country and reformatted records to UNIMARC for international exchange. A generic MARC record conversion program called USEMARCON has been developed to assist this. Recently, however, a number of major suppliers of MARC records have agreed to use the USMARC format. The Canadian national format has already been fully harmonized, but more work remains to be done to harmonize the UKMARC (see Figure 3.8) and USMARC formats.

Images

Figure 3.8 Record in UKMARC format

As well as the format for Bibliographic Data, there are USMARC formats for Community Information, Holdings Data, Classification Data and Authority Data. UKMARC also has a name authorities format. A joint authority file, the Anglo-American Authority File (AAAF), has been established.

The MARC record format complies with ISO 2709:1996 Information and Documentation: Format for Information Exchange (ISO, 1996), and with ISO 1001:1986 Information Processing: File Structure and Labelling of Magnetic Tapes for Information Interchange (ISO, 1986). The components of the format are:

The following elements, called field enumerators, define the data content of each field:

  1. Tag: a three-digit number within the range 000-945. The tags have a mnemonic structure in that they follow the order of a catalogue record, and the tags for added entries mirror those for main entry headings. The variable fields are grouped in blocks according to the first character of the tag:

    1xx Main entries

    2xx Titles and title paragraph (title, edition, imprint)

    3xx Physical description, etc.

    4xx Series statement

    5xx Notes

    6xx Subject access fields

    7xx Added entries other than subject or series

    8xx Series added entries

    9xx Local data.

    Tags for specific fields are created by entering digits in the final two places, for example:

    100 Personal author main entry heading

    110 Corporate name main entry heading

    240 Uniform title

    245 Title and statement of responsibility

    250 Edition and statement of edition author, editor, etc.

    260 Imprint

    A personal author’s name generally has ‘00’ in the second and third positions, so that:

    100 is used for a main entry personal author heading

    600 is used for a personal author subject heading

    700 is used for a personal author added entry heading.

  2. Indicators: two characters (normally digits) which follow the tag, and intro duce the variable length fields that contain bibliographic data. Indicators are unique to the field to which they are assigned, and are used for such purposes as: to distinguish between different types of information entered in the same field; to provide for title-added entries; and to indicate the number of characters to be dropped in filing titles. For instance, in the field for main entry personal author heading, the following indicators are used in conjunction with the 100 tag, for the name of a person entered under:

    100.00 A given name

    100.10 A surname or single title of nobility

    100.20 A compound surname or title, or one with a separate prefix, or an element of the name other than the first or last.

  3. Level: a single digit introduced by a colon, indicating whether a separate entry has been made for a work contained in another publication (for example, individual plays in a collection).

  4. Repeat introduced by a slash, differentiating between two fields with the same tag, for example, if a work belongs to more than one series.

  5. Subfields: indicating smaller distinct units within a field, which may require separate manipulation. Typical subfields in the imprint area are place of publication, publisher and date of publication. Subfields are preceded by a subfield code, which consists of a single non-alphanumeric symbol (e.g. ‘£’) and a single letter. The imprint might be coded as: 260.00 £aLondon EbPitman £c1996. Subfield codes control such factors as appearance and (in UKMARC but not in USMARC records) punctuation. So, for example, the subfield coding just shown for the 260 field would generate the statement London: Pitman, 1996 in a Level 2 or Level 3 description, or Pitman, 1996 for a Level 1 description. Subfield codes are defined in the context of the field in which they are used, but similar codes are used in parallel situations. For example, the subfield codes for a person’s name are constant, regardless of whether the name is main, or additional, author, or subject entry heading.

  6. Field mark: a hash (#) representing the end of a field. This is necessary when variable length fields are used.

MARC records can be used in the following kinds of application:

  1. Information retrieval. Most of the fields and subfields can be searched on, and together provide an exceptionally wide range of access points. In practice, different applications make their own selection of search keys from those available.

  2. Displaying citations. Records are rarely displayed in their ‘raw* MARC format, except for cataloguers. For most applications, the tags are either suppressed or replaced with appropriate verbal descriptions (for example, Imprint: in place of 260), and unnecessary fields and subfields suppressed. Many applications allow the data to be displayed or printed in more than one format

  3. Cataloguing. Cataloguers can call up MARC records using control numbers, or by the search keys available for information retrieval, or by acronym searches. Records may be selected online from a central database or from a CD-ROM, or by offline selection. Here, the user creates a request file of control numbers, which can be input by file transfer, or by e-mail, or sent on disc or tape to a processing agency. The Internet file transfer protocol (FTP) is increasingly being used to distribute the British library’s weekly BNBMARC file.

  4. Identifying new publications. The major national bibliographic agencies operate Cataloguing-in-Publication (CIP) programmes. Arrangements are made with individual publishers to supply advance copies so that a skeleton MARC record can be made available in advance of publication. A full MARC record is made after publication and legal deposit, replacing the CIP record.

  5. Resource sharing. The MARC format was designed from the start to facilitate the exchange of bibliographic data. Many library cooperatives have a central database in MARC format to which members can contribute records, and from which they copy records for local use. The library of Congress is developing the MARC DTD project which aims to create standard SGML DTDs to support the two-way conversion of cataloguing data between the MARC data structure and SGML without loss of data. The MARC data structure is an international standard (ISO 2709), but is dauntingly specialized. SGML is widely used in publishing, and the project hopes to make MARC more attractive for use in less specialized environments.

How far will MARC DTD go towards providing a user-friendly MARC format? Like anything else, MARC is a product of its age, and its age was the 1960s -the age of the catalogue card. MARC’s designers took a structuralist approach to bibliographic record format design: to try to think of every possible functionality that could ever be required of a catalogue card and devise a separate mechanism for each. It is instructive to compare the complexity of the MARC format with the minimalist approach of the Dublin Core, and to reflect that much of the input into the design philosophy of the Dublin Core was made by cataloguers.

Nevertheless, to the cataloguer, MARC’s problems are not so much its complexity as its straojacketing effect Some problems are problems of detail - for example, Festschriften can be specified in the 008 field, but not large print books. Or there is the problem of redundancy, where identical or similar data can appear in different parts of the record. A perennial problem (which we discuss in Chapter 9) is that the format ties everybody to the main entry concept, which many would like to see buried. Others criticize its paucity of links with other records; or the difficulty of adapting it to a multi-tiered record structure -the list is endless. On the other hand, database vendors and others with large MARC databases to manage blanch at the thought of the slightest change to the format. By dint of everyday familiarity, cataloguers live with MARC’s obsolete and inefficient features, just as we all live with similar features in QWERTY keyboards and English spelling.

The Common Communications Format

There are many formats for bibliographic records. Rarely are two national formats sufficiently similar that they can be handled by the same computer programs. The bibliographic descriptions carried by these formats differ widely depending on their source. Abstracting and indexing services use different rules of bibliographic description to those Mowed in library cataloguing. The MARC format, which is used as an exchange format by major libraries, assumes the ISBD to be the standard. On the other hand, abstracting and indexing services may (but do not all) acknowledge the UNISIST Reference Manual, which prescribes its own content designators for the bibliographic descriptions of various types of materials. These two major formats define, organize and identify data elements in different ways and rely upon different sets of codes. Thus, it has been difficult or virtually impossible to mix in a single file bibliographic records from different sources. The CCF was thus designed with the aim of facilitating the communication of bibliographic data among the sectors of the information community.

In common with MARC, the CCF constitutes a specific implementation of ISO 2709. The CCF, then,

This last provision shows one difference with MARC. The CCF has been designed from the outset to link records at different bibliographic levels (e.g. Series - Monograph - Analytic), as this has always been an important feature of indexing services. Another difference is CCFs simplicity and permissiveness: rather than users adapt to the format, the format was designed to be adaptable to a range of practices.

Record Formats in Local Systems

Most of the centralized and shared cataloguing projects take account of and probably use the MARC record format. This degree of standardization is not the pattern outside this specific area of application. Essentially, there are two different categories of systems that may be encountered: publicly available databases, and local systems supported by software packages.

For the large public databases, there has been little pressure to accept a standard format, and each database producer has in general chosen a record format to suit the particular database. Even one database may emerge in different record formats according to the host on which it is mounted. Individual decisions are made concerning the fields to be included and the indexing to be made available. Yet another variable factor is the presence of full-text and, more recently, multimedia databases, which demand a somewhat different record format from bibliographical records if the information is to be appropriately displayed.

The record formats to be encountered in local systems that are supported by software packages are many and various. Most of these software packages offer cataloguing systems which will work in a MARC record format, or which produce records which are compatible with the MARC record format Others do not offer such an option. Virtually all software packages offer the purchaser the opportunity to evolve a record format that suits a specific application. Thus, in local systems there may well be great variability in record format, as designs are implemented within the parameters set by the various software packages.

Summary

This chapter has ranged widely across the formats for document representation. While representation and access are distinct topics, it must be remembered that the concept of access is built into most formats. Citations have an explicit filing element. More generally, most kinds of document representation can be entered into a retrieval system in such a way that the words they contain can be used as search keys in mechanized retrieval systems. Also, the record formats used in abstracting and indexing services (including MARC) are structured around access keys: titles, authors, classification codes and the rest.

This chapter has also considered records individually, whereas records are usually stored in databases along with other records of the same general type. We go on in Chapter 4 to look at databases, after which access in all its ramifications will be discussed.

References and Further Reading

Abstracts and Abstracting

Cremmins, E. T. (1982) The Art of Abstracting. Philadelphia: ISI Press.

International Standards Organization (ISO) (1976) Documentation: Abstracts for Publication and Documentation. ISO 214:1976E. Geneva: ISO.

Jizba, L. (1997) Reflections on summarizing and abstracting: implications for Internet Web documents, and standardized library databases. Journal of Internet Cataloging, 1 (2), 15–39.

Lancaster, F. W. (1998) Indexing and Abstracting in Theory and Practice, 2nd edn. London: Library Association.

Rowley, J. E. (1988) Indexing and Abstracting, 2nd ed. London: Library Association.

Wheatley, A. and Armstrong, C. J. (1997) Metadata, recall, and abstracts: can abstracts ever be reliable indicators of document value? Aslib Proceedings, 49 (8), September, 206–213.

Bibliographic Records. MARC

Anglo-American Cataloguing Rules (AARC2) (1998) 2nd edn, revd. London: Library Association.

Attig, J. C. (1983) The concept of a MARC format. Information Technology and Uibrarie, 2, 7–17.

Avram, H. D. (1975) MARC, its History and Implications. Washington, DC: Library of Congress.

Burke, M. A. (1999) Organization of Multimedia Resources: Principles and Practice of Information Retrieval. Aldershot Gower.

Byrne, D. J. (1988) MARC Manual: Understanding and Using MARC Records, 2nd edn. Englewood, CO, Libraries Unlimited.

Crawford, W. (1989) MARC for Library Use. Boston, MAG.K. Hall.

Fecko, M. B. (1993) Cataloging Nonbook Resources: A How-To-Do-It Manual for Librarians. New York: Neal-Schuman.

Fritz, D. A. (1998) Cataloging with AACR2R and USMARC: For Books, Computer Files, Serials, Sound Recordings, Video Recordings. Chicago: American Library Association (ALA).

Furrie, B. (1998) Understanding MARC: Machine Readable Cataloging, 5th edn. Washington, DC: Library of Congress, Cataloging Distribution Service.

Gorman, M. (1989) Yesterday’s heresy - today’s orthodoxy: an essay on the changing role of descriptive cataloging. College and Research Libraries, 50 (6), 626–634.

Gredley, E. and Hopkinson, A. (1990) Exchanging Bibliographic Data: MARC and Other International Formats. London: Library Association.

Hagler, R. (1997) The Bibliographic Record and Information Technology, 3rd edn. Chicago: American Library Association.

Hill, J. S. (1996) The elephant in the catalog: cataloging animals you can’t see or touch. Cataloging and Classification Quarterly, 23 (1), 5–25.

International Standards Organization (1986) Information Processing - File Structure and Labelling of Magnetic Tapes for Information Exchange. ISO 1001:1986. Geneva: ISO.

International Standards Organization (1996) Information and Documentation - Format for Information Exchange. ISO 2709:1996. Geneva: ISO.

ISBD(G) (1977): General International Standard Bibliographic Description: Annotated Text. London: IFLA.

ISBD(M) (1987): International Standard Bibliographic Description For Monographic Publications. Rev. edn. London: IFLA

Lipow, A. G. (1991) Teach online catalog users the MARC format? Are you kidding? Journal of Academic Librarianship, 17 (2), 80–85.

McRae, L. and White, L. S. (eds) (1998) ArtMARC Sourcebook: Cataloging Art, Architecture, and their Visual Images. Chicago: American Library Association.

Spicher, K. M. (1986) The development of the MARC format Cataloging and Classification Quarterly; 21 (3/4), 75–80.

UNESCO (1986) UNISIST Reference Manual for Machine-readable Bibliographic Descriptions, 3rd edn, compiled and edited by H. Dierickx and A. Hopkinson. Paris: UNESCO.

Citation Practices

British Standard 5605:1990 (1990) Recommendations for Citing and Referencing Published Material. London: British Standards Institution.

British Standard 1629:1989 (1989) Recommendations for References to Published Material. London: British Standards Institution.

Chicago manual (1993) The Chicago Manual of Style: For Authors, Editors and Copywriters, 14th edn. Chicago: University of Chicago Press.

International Standards Organization (ISO) (1999) Information and Documentation - Bibliographic References - Part 2: Electronic Documents or Parts Thereof. ISO 690–2. Geneva: ISO. Selections available from World Wide Web: <http://www.nlc-bnc.ca/iso/tc46sc9/standard/690–2e.htm>.

Turabian, K. L. (1996) A Manual for Writers Term Papers, Theses and Dissertations, 6th edn. Chicago: University of Chicago Press.