Chapter 12

Using Spreadsheets in a Historical Study

Spreadsheets can be used to record data as an alternative to a database. They cannot replace specialist databases such as Gramps which are used by family historians who are discovering their family tree. If the researcher needs a good view of their data, or if the study will involve statistical analyses, a spreadsheet may be the best choice.

Spreadsheets can be a useful tool for historical investigations even if the study does not involve numerical data. Items in a spreadsheet can be sorted alphabetically making it possible to look for patterns and relationships in names, occupations, places of residence or any other factor. If a piece of data is factual it may be feasible to record it by name. If it is expressing a situation it may be necessary to devise and use a code.

A spreadsheet is a very flexible method for data recording because it can be designed to capture whatever information is needed. It is wise to make full use of its capabilities by splitting information into plenty of categories and recording each aspect in its own column. This aids analysis and makes it easy to see if something has not been recorded.

When setting up a spreadsheet from newspaper sources, three fields should always be included: source, date and additional information.

Source

This is important as it authenticates the rest of the record and enables it to be located easily. Usually the name of the newspaper and date of publication are sufficient. To prevent this taking up too much screen space, the cell storing source information can be kept small and the text size reduced as it is rarely needs to be read. If necessary, the information can easily be viewed by highlighting the cell, which makes the entry appear full size in the input box.

Date

The date should include at least the year and the month. Date information makes it possible to sort entries into chronological order, and focus on a shorter period within the data set if necessary. The date is crucial for identifying when there may be gaps or imbalance in a sample that covers a period of time. It also makes it possible to study how something develops over the years. When working with a spreadsheet it is helpful to store items so that they appear chronologically on screen, so sort them into order at the end of an input session, or use the insert row option to get them into the right order whilst inputting.

Additional Information

It is inevitable that there will be unusual features about some items that need to be recorded. This could be as varied as an assumption about the data, a note that a report offers good insight into a topic, that the case might be a good one to include as an example, or perhaps further investigation of it is required. Storing this as additional information keeps the research in one place rather than becoming scattered in other notes. If the same point is regularly being included as Additional Information, decide whether it should be incorporated as a separate field as soon as this becomes apparent.

Devising the Spreadsheet

At the start of an investigation it may not be obvious which will be the most productive aspects and it is worth taking a little time to read through a few newspaper reports before creating a spreadsheet. As the concerns of the past are not necessarily the concerns of the present, it may become apparent that certain information needed for a study was not recorded, or perhaps an unexpected angle could present itself. Although it is possible to insert extra fields at any point, backtracking to collect detail whose significance was not initially appreciated can waste time. Bear in mind that, unless the number of items in the study is small, it will not be feasible to check every one before settling on the data to be collected. At some stage, usually early on in the study, reasoned decisions about what to record will have to be taken.

When a field on a spreadsheet is sorted, items will be listed in numerical or alphabetical order. Thinking carefully about this aspect when designing a spreadsheet can avoid problems when analysing it.

Name

A ‘name’ identifies an item and needs a field of its own. It can be the name of a person, a company or perhaps the plaintiff and defendant in a court case. When using the name of a person, it is better to use surname followed by first name rather than vice versa. Data in surname order may be relevant but an alphabetical list of forenames is not likely to be needed. If data might need to be sorted by forename and surname, use two columns.

Address

If the address of any premises is included, consider how to deal with the number. A study is more likely to want to analyse a whole road rather than all premises whose number is twelve. Using one field for the premises number (or name) and the next field for other address details would allow the data to be grouped by road or number.

Age

Old newspapers did not routinely report a person’s age and when they did, the report sometimes included an opinion based on the individual’s appearance rather than an accurate figure. Even at the beginning of the twentieth century older people could be very hazy about their age. There are also instances of an individual being described as a few years older or younger than someone else.

As it can be difficult to establish a person’s age beyond reasonable doubt, always decide how accurate this information needs to be. In many cases it will be possible to use an age band such as 51-60 in the study. This eliminates any difficulties with a person who is described as ‘about 55’ and a sister who is ‘a couple of years younger’.

It is sensible to record any ‘best estimate’ data in a different style on the spreadsheet such as in italics, underlined, or in a coloured font so that it can be spotted easily and taken into account when interpreting any results if this seems necessary. Best estimate data will not be acceptable if absolute precision is required.

Occupation

A person’s occupation is a very useful piece of information. Not only does it show how the person was earning their living, it can be used as a proxy for other data, such as a person’s likely social class and position, how they may have been regarded by others, values they may have held and their financial standing and spending power.

Occupation can be a challenging category to record and analyse because there were so many trades, crafts and professions a person could earn their living from and a number of levels at which they could be working. Unless fine detail is necessary, standardise terms as much as possible. A maid, housemaid, parlourmaid, lady’s maid, scullery maid, kitchen maid and cook could all be described as servants, as could a butler, footman, coachman, valet and gentleman’s gentleman.

The generic term shopkeeper can be used to describe a butcher, draper, grocer, greengrocer, ironmonger and tobacconist whilst craftsman might reasonably include plumbers, carpenters, cordwainers and smiths.

Sometimes, however, the breadth of an occupation can be so wide it can be almost meaningless. A farmer, for example, was someone who made a living from the land, irrespective of whether he owned thousands of fertile acres and employed an army of men to cultivate them, or rented a tiny holding from which he scraped a living. When dealing with farmers, it is always worth using FarmerL, FarmerM and FarmerS to distinguish between the large, moderate and small. It may also be useful to record whether someone is self-employed or an employee, perhaps by a term such as CraftsmanS or CraftsmanE.

Women, and young unmarried men, may have been described with reference to the head of household’s occupation, rather than their own, (if they had one). When recording occupation, decide which is more relevant to the research if the head of household’s is also given. The unmarried daughter of a self-employed man may have worked in the same type of occupation as the daughter of one of her father’s employees, but would they have shared the same values or spending power?

This spreadsheet records data about breach of promise claims. Do not worry about the variable appearance of content in some columns. Perfect appearance is not essential in a working document. So long as information is recorded clearly and consistently, feel free to use abbreviations or customise columns in different ways. Note also the icons across the top of the spreadsheet which are used for customising the spreadsheet and also for carrying out some types of analysis. Tabs at the bottom allow a variety of different sheets to be created.

It can be helpful to use two adjacent columns to record occupation data. Listing a person’s actual job preserves variety and can reveal how the economy or social structure functioned. Listing a simplified version, or even devising and entering a code, makes analysis much more straightforward and can prevent general themes and patterns becoming lost in a welter of detail.

Place

This is another useful category as it can pinpoint differences between regions or between the urban and the rural. A point to remember is that a village, town or county may have changed its name or boundaries over the years. This can be relevant when carrying out comparisons over time. Place is important in identifying any gaps or imbalance in a sample that purports to cover the whole country.

Financial Data

The value of money alters unpredictably with time and there can be step changes also, such as in 1971 when Britain changed its currency. If any pre-decimal financial data is included, it is advisable to convert to the decimal equivalent, as spreadsheets cannot perform calculations on the old format. Details of how to convert currency are included in Appendix 4. A similar point applies to old weights and measures.

How Many Aspects Should be Investigated?

This can only be answered on a case by case basis, as it depends on what the researcher hopes to discover about their subject. General studies, or studies that span a long time period, may need to consider more elements than one which sets out to investigate a specific question or a short time frame. Researchers who do not wish to study a large number of variables, or whose time is limited, may decide to restrict the number of aspects they have to deal with by keeping their project specific. A useful approach is to devise a project which could be progressively developed so that it builds up into something much bigger. The findings of a study that was limited to a single town or year can be tested by repeating the investigation for other localities or looking at the same town a few years earlier or later.

As a substantial amount of time and effort can be spent locating and reading newspaper reports, it makes sense to capture any important facts that are mentioned regularly, even if the need for them is not readily apparent. There would probably have been a good reason at the time for the identical fact being reported so often. Be discriminating about peripheral detail. A woman’s age might correlate with other features and provide valuable learning. It is unlikely that the colour of her hair would be relevant to many topics.

If an element of the subject has already been studied widely and there is consensus about it, collecting another set of detailed data may be a poor use of time, unless the researcher wishes to challenge previous findings or to use the information in a different manner.

When studies cover a wide time period, be prepared for the amount and type of information to alter. The issue being investigated could have been affected by an external factor, such as a change in the law. Social attitudes are also relevant and if a topic became more fashionable or less interesting to newspaper readers, the frequency and content of reports may have altered to reflect this.

Data Completeness

At an early stage, a researcher will have to decide what information is needed before an item can be included in a study. While it is tempting to think ‘everything’, in reality this is unlikely to be achieved because some newspaper reports were less detailed than others. Being too rigid about data completeness could render a study impossible because there are insufficient cases in the sample. There are statistical techniques which can determine the degree of inaccuracy arising from missing data and it may be possible to draw reasonable conclusions from data sets that have a few items missing. This is another reason for using more than the bare minimum of cases when conducting a historical study, as it helps to address the issues that arise when data is incomplete.

Do not reject any item before the analysis stage of an investigation. It is possible that the missing data was included in another newspaper. Always keep a note of any item rejected because of insufficient data. Incomplete items are part of the total number and the fact that newspapers did not think certain detail worth recording may suggest something about attitudes of the time.

Locating missing data

It is unusual for a single newspaper report to contain every piece of data needed in a study. When a historian has finalised a sample and extracted the information required, it is likely that some gaps or discrepancies will remain, even if more than one newspaper has been checked. At this point, the researcher has to decide whether to chase detail that might never have been recorded or proceed without it. Always ensure that the most relevant newspapers have been consulted before accepting that data is not available.

•   If the initial source was a national newspaper, a local one may have further information.

•   The most likely newspaper to carry a detailed account is one that is local to the event in question.

•   A well-structured query on the search page of a provider that has several newspapers might identify one that contains the missing details.

There are occasions when trying to fill in gaps is not a good use of time.

•   If the same piece of data is missing from many entries it may never have been recorded.

•   If it is already clear that no useful learning will result from that aspect of the investigation.

•   If resolving a minor discrepancy will not improve the analysis in any way.

Inference

Some pieces of data can be found by inference. If the weekly wage is known, annual earnings can be worked out. If the weekly rent is known, it may be feasible to form a view about the type of housing a person lived in, based on knowledge of the period and place in question.

The dividing line between inference and guesswork can be a fine one. It is always better to leave a gap than to guess, as the latter will lead to false conclusions. Gaps do not reflect badly on a conscientious researcher who has made sufficient effort to find a piece of data and concluded that it is not available.

Reasonableness of Data

When as much data as possible has been located, review it for reasonableness before analysing it. Take care with numbers because they cannot be relied on to scan well. If any number seems out of line, check the information, preferably with a different newspaper.

Analysis Involving Incomplete Data

If a large number of items are being investigated, valid learning can arise when there are gaps in data, so long as there are not too many of them. If there is only a small number of items, forming conclusions is more problematic. Any investigation that involves incomplete data must have regard for what is missing and only draw conclusions that can be substantiated.

Example

If a group of 20 prisoners was made up of 9 men, 5 women and 6 whose gender is unknown, it would be valid to state that it comprised between 45 per cent and 75 per cent men depending on the unknown data. It would also be valid to state that it comprised between 25 per cent and 55 per cent women depending on the unknown data. This is a very wide range and it is frustrating not to be able to get more accuracy.

Trying to achieve this by assigning a gender to the unknowns in some arbitrary fashion would be misleading, as would ignoring the 6 unknowns and calculating figures based on a sample size of 14.

If the group comprised 200 people and the gender of 6 was not known, it would be possible to form valid conclusions based on the 194 pieces of data that were known.

Good historical investigations take the evidence that is available and respect it. Even when it only provides a low level of insight, it may be an advance on the previous position. Trying to obtain better accuracy by adjusting for unknowns, without having a valid basis for doing so, is the hallmark of a mediocre historian. Attempting to take advantage of gaps in evidence to construct results that ‘prove’ an outcome desired by the researcher is the sign of an unethical approach.

Interpreting Results

Although computers are excellent at crunching numbers, the answers they generate are not an end in themselves, and are rarely informative without further work by a researcher to interpret them.

Newspapers themselves are a good place to start as they might contain editorials or readers’ letters that put some or all of the findings into context. If editorials and letters appear to contradict the research findings, consider why. It may be that people at that time had a wrong perception of an issue, perhaps because they did not have the more unbiased information that a researcher is now able to locate and study.

Good research inevitably leads to further questions, so do not settle for one set of analyses. Be prepared to select subsets of the data and investigate them further. If 75 per cent of cases have very similar features, consider performing two separate investigations rather than blurring distinctions with averages.

Analyses that cover many years may be insensitive to any legal, social and economic changes that were affecting the issue. It may be necessary to look at the data on a decade by decade basis, use a moving average or adjust all monetary values for inflation in order to obtain good insight.

Presenting Results

When presenting results, always disclose any assumptions that have been made and any gaps or other limitations in the data. It is better to be open on this point from the outset, rather than risk someone else undermining the research by pointing out flaws. It is not always possible to obtain a perfect set of data and it does not denote poor research, so long as appropriate effort has been made to obtain the information.

If findings are illustrated by case studies, the most interesting examples, or the ones where most information is available in newspapers, may be atypical. These should not be ignored if they reveal something about the issue or society of the time, but the study should also include some examples that are typical. If a study of legal cases showed that the most frequent sum given as damages was £50 and it was usually to a young married man, including one case that contained these elements would be appropriate, even if it was not intrinsically interesting.

This is part of a spreadsheet used solely by the author to plan and to monitor progress when writing this book. Column F, which represents how complete the author considered each chapter, was left unheaded to save screen space. When a spreadsheet is to be shared, ensure that everything is clear and always use headings. Also consider the positioning of columns. As the percentage is not calculated from the numbers that precede it, perhaps the layout should be changed to avoid inadvertent confusion?

Further Uses of Spreadsheets

This chapter has concentrated on using a spreadsheet to collect and analyse data. Spreadsheets can be used for many other purposes, such as planning research, recording what has been done, or analysing some items in further detail.

Modern spreadsheets have a row of tabs at the bottom which open further sheets. Always make full use of this facility. It is an excellent way of keeping all aspects of a piece of research together and makes switching between the different records very easy.

Conclusion

The purpose of this chapter is to give ideas about how to make use of the database technique with spreadsheets, rather than being a prescriptive way of how to collect data. Each individual researcher must decide what data to collect and how to record it.

For anyone who would like to consolidate this introduction to historical investigations covered in Chapters 9-12, a worked example is included at the end of the main text. Anyone who would like to understand the full range of statistical techniques that are available to a researcher should consult a suitable text book.