Chapter 1

Introducing SPSS

IN THIS CHAPTER

check Considering the quality of your data

check Communicating with SPSS

check Seeing how SPSS works

check Finding help when you’re stuck

A statistic is a number, but it's a special kind of number. A statistic is a measurement of some sort. It’s fundamentally a count of something — occurrences, speed, amount, or whatever. A statistic is calculated using a sample. In a sense, a sample is the keyhole you have to peer through to see the population, which is what you’re trying to understand. The value at the population level — the average height of an American male, for instance — is called a parameter. Unless you’ve got all the data there is, and you’ve collected a census of the population, you have to make do with the data in your sample. The job of SPSS is to calculate. Your job is to provide a good sample. Together you try to understand the population even though all you have is a sample.

In this chapter, we discuss the importance of having accurate, reliable data, and some of the implications when this is not the case. We talk also about how best to organize your data in SPSS and the different kinds of files that SPSS creates. We take a trip down memory lane and discuss the origins of SPSS so you can understand all of its many names. We discuss what can be done in the program and the different ways of communicating with the software. Finally, we spend some time discussing different ways in which you can get help when navigating SPSS.

SPSS’s Job, Our Job, and Your Job

It's important to have appropriate expectations. In this section, we discuss the various roles that all parties play with regard to learning to use SPSS.

SPSS’s job

After you give SPSS data and instructions, it will perform the calculations for you. The data and the metadata — the information about the data — have to be correct. (We have a lot to say about metadata.) The instructions have to be correct as well. Correct data processed with the wrong technique won’t give you the results you need.

SPSS won’t make math errors. That’s not the kind of errors computers make. They always do exactly what we tell them to do, and sometimes that's the problem. SPSS’s job is to take data that has been declared correctly and produce statistical results in the form of tables and charts that allow you to draw conclusions about your data — if you know how to interpret those results.

Our job

Your authors, Jesus and Keith, have had so many hours using SPSS that we’ve lost count. You may have heard of the somewhat controversial 10,000 hour rule, which states that you need that many hours of “deliberate practice” to truly master a complex subject. Well, you’ll be pleased to know that Jesus and Keith each have more than 10,000 classroom hours teaching SPSS in addition to decades of using it on our own projects.

As the authors, we are primarily responsible for the following:

Making the book easy to read: We know where to focus your attention when you're getting started because we’ve helped thousands of SPSS beginners get started.
Walking you through how to set up everything properly: Parts 1 and 2 focus on getting started with SPSS as well as how to define data files, variables, and their attributes.
Acclimating you regarding how SPSS works: Getting you familiar with the software is our number one focus and mission. We hope to get you to a point where you stop worrying about the software so you can concentrate on your analysis.
Explaining how to tell SPSS to do basic tasks: This edition has a greatly expanded section in Parts 5 and 6 on understanding the basic theory, choosing the right technique, and interpreting results. We realize that if you're confused about SPSS, it may be because you're confused about statistics. However, we spend the rest of the book helping you understand the software.
Starting you on your SPSS Statistics journey: You won't master statistics when you reach the end of this book, but you will be much more comfortable with SPSS software. No book of this length could cover all statistical techniques, but we selected the most important procedures for new users of SPSS.

Your job

Your number one job is to relax. You might be reading this book because you're up against a deadline or something in SPSS is causing you stress. However, if you take a little time now to understand how to work in SPSS efficiently, it will pay off in the long run. You have some other responsibilities:

Know your data. SPSS can’t put your data into context — only you can. SPSS will trust everything you give it. It will never second-guess the data you give it, and it will never ask you about it. Only you can ensure that the data is trustworthy.
Declare your data and set it up properly. Declaring and setting up yourdata is a critical responsibility that we cover thoroughly, especially in Chapters 3 and 4. Setting up data is not just about declaring the metadata but also about other data management tasks that we cover in Part 3.
Choose the correct statistical technique. The toughest task for you may be to choose the correct statistical technique. Dozens of techniques are available, but we know which ones are the most important to learn. We’ve given you a life preserver. If you feel stuck and don’t know which chapter to refer to, check out Chapter 22, where we provide an overview for analyzing data and a larger context of where in the research process various statistical techniques are typically used.
Know how to interpret the results. Finally, Parts 5 and 6 have many examples of how to interpret statistical output.

Garbage In, Garbage Out: Recognizing the Importance of Good Data

SPSS doesn’t warn you when there is something wrong with your sample. Its job is to work on the data you give it. If what you give SPSS is incomplete or biased, or if there is data that doesn’t belong in there, the resulting calculations won’t reflect the population very well. Not much in the SPSS output will signal to anyone that there is a problem. So, if you’re not careful, you can conclude just about anything from your data and your calculations.

Consider the data in Table 1-1. What if you calculated the survival rate of Titanic passengers based on this small sample? What if you calculated what fraction of the passengers were in each class of service? You can easily see that you’d be in real trouble.

TABLE 1-1 Sample of Titanic Passengers

Survived or Died	Class	Name	Sex	Age	Fare Paid	Cabin	Embarkation
Died	1	Andrews, Mr. Thomas, Jr.	Male	39	0.00	A36	Southampton
Died	1	Parr, Mr. William Henry Marsh	Male		0.00		Southampton
Died	1	Fry, Mr. Richard	Male		0.00	B102	Southampton
Died	1	Harrison, Mr. William	Male	40	0.00	B94	Southampton
Died	1	Reuchlin, Mr. John George	Male	38	0.00		Southampton
Died	2	Parkes, Mr. Francis “Frank”	Male		0.00		Southampton
Died	2	Cunningham, Mr. Alfred Fleming	Male		0.00		Southampton
Died	2	Campbell, Mr. William	Male		0.00		Southampton
Died	2	Frost, Mr. Anthony Wood “Archie”	Male		0.00		Southampton
Died	2	Knight, Mr. Robert J.	Male		0.00		Southampton
Died	2	Watson, Mr. Ennis Hastings	Male		0.00		Southampton
Died	3	Leonard, Mr. Lionel	Male	36	0.00		Southampton
Died	3	Tornquist, Mr. William Henry	Male	25	0.00		Southampton
Died	3	Johnson, Mr. William Cahoone, Jr.	Male	19	0.00		Southampton
Died	3	Johnson, Mr. Alfred	Male	49	0.00		Southampton

However, consider this: Would you be tempted to drop these cases from your analysis because their fare information appears to be missing? What if fare information were provided for all the other passengers? You might drop the cases in Table 1-1 but use everyone else. You’d be dropping only a handful of passengers out of hundreds, so that would be okay, right? The answer is no, it would not be okay. As it turns out, there is a good reason that each of these passengers didn’t pay a fare (for example, Mr. Thomas Andrews, Jr., designed the ship), and if this was your data, your job would be to know that.

Sampling is a big topic, but here’s the quick version:

The data points in your sample should be drawn at random from the population.
There should be enough data points.
You should be able to justify the removal of any data points.

THE ORIGIN OF SPSS

In 2018, IBM SPSS Statistics turned 50. That makes it older than Windows and older than the first Apple computer, so in the early days SPSS was run on mainframe computers using punch cards.

At Stanford University in the late 1960s, Norman H. Nie, C. Hadlai (Tex) Hull, and Dale H. Bent developed the original software system named Statistical Package for the Social Sciences (SPSS). They needed to analyze a large volume of social science data, so they wrote software to do it. The software package caught on with other folks at universities, and, consistent with the open-source tradition of the day, the software spread through universities across the country.

The three men produced a manual in the 1970s, and the software’s popularity took off. A version of SPSS existed for each of the different kinds of mainframe computers in existence at the time. Its popularity spread from universities into the public sector, and it began to leak into the private sector as well.

In the 1980s, a version of the software was moved to the personal computer. In 1992, Jack Noonan became CEO of SPSS, Inc. (replacing Nie) and a period of acquisition of smaller software companies began. Many of those products are still part of the SPSS family, such as IBM SPSS AMOS (for structural equation modeling) and IBM SPSS Modeler (originally called Clementine). In 2009, SPSS, Inc. was acquired by IBM, and the name of the product became IBM SPSS Statistics to differentiate it from the other products.

The official name of the software today is still IBM SPSS Statistics, and it's available in several formats and versions. We discuss these different options in Chapter 2.

This book is not about the accuracy, correctness, or completeness of the input data. Your data is up to you. This book shows you how to take the numbers you already have, put them into SPSS, crunch them, and display the results in a way that makes sense. Gathering valid data and figuring out which cases to use is up to you.

Your data is your most valuable possession. If you're the only one in the world with your data, be sure to back it up before you start working with it. Make sure you have multiple copies, ideally with one copy in the cloud. At key milestones in your analysis and data modifications, remember to save it again. The last thing you want is to lose your data.

Talking to SPSS: Can You Hear Me Now?

More than one way exists for you to command SPSS to do your bidding. You can use any of three approaches to perform any of the SPSS functions, and we cover them all in this section. The method you should choose depends not only on which interface you prefer, but also (to an extent) on the task you want performed.

The graphical user interface

SPSS has a window interface. You can issue commands by using the mouse to make menu selections that cause dialog boxes to appear. This is a fill-in-the-blanks approach to statistical analysis that guides you through the process of making choices and selecting values. The advantage of the graphical user interface (GUI) approach is that, at each step, SPSS makes sure you enter everything necessary before you can proceed to the next step. This interface is preferred for those just starting out — and if you don’t go into depth with SPSS, this may be the only interface you ever use.

Syntax

Syntax is the internal language used to command actions from SPSS. It’s the command syntax of SPSS (hence, its name). Syntax is often referred to as the “command language.” You can use the Syntax command language to enter instructions into SPSS and have it do anything it’s capable of doing. In fact, when you select from menus and dialog boxes to command SPSS, you’re actually generating Syntax commands internally that do your bidding. In other words, the GUI is nothing more than the front end of a Syntax command-writing utility.

Writing (and saving) command-language programs is a good way to create processes that you expect to repeat. You can even grab a copy of the Syntax commands generated from the menu and save them to be repeated later.

Programmability

Programmability refers to the myriad ways of customizing SPSS with extensions. These extensions are new capabilities that the user community adds to SPSS using the programming languages Python and R. Learning to write these powerful new features is beyond the scope of this book, but you should know that they exist. (SPSS has an entire menu called Extensions.) When you allow SPSS to do so, a number of these extensions are installed during installation.

How SPSS works

The developers of SPSS have made every effort to make the software easy to use. SPSS prevents you from making mistakes or even forgetting something. That’s not to say it’s impossible to do something wrong in SPSS, but SPSS software works hard to keep you from running into the ditch. To foul things up, you almost have to work at figuring out a way of doing something wrong.

You always begin by defining a set of variables; then you enter data for the variables to create a number of cases. For example, if you’re doing an analysis of automobiles, each car in your study would be a case. The variables that define the cases could be things such as the year of manufacture, horsepower, and cubic inches of displacement. Each car in the study is defined as a single case, and each case is defined as a set of values assigned to the collection of variables. Every case has a value for each variable. (Well, you can have a missing value, but that’s a special situation described later.)

There are different types of variables. These types describe how the data is stored — for example, as letters (strings), as numbers, as dates, or as currency (see Chapter 4 for more information on data types). Each variable is defined as containing a certain kind of number, so you also have to define the variable’s level of measurement. For example, a scale variable is a numeric measurement, such as weight or miles per gallon. A categorical variable contains values that define a category; for example, a variable named gender could be a categorical variable defined to contain only values 1 for female and 2 for male. Things that make sense for one type of variable don't necessarily make sense for another. For example, it makes sense to calculate the average miles per gallon, but not the average gender.

After your data is entered into SPSS — your cases are all defined by values stored in the variables — you can easily run an analysis. You’ve already finished the hard part. Running an analysis on the data is simple compared to entering the data. To run an analysis, you select the analysis you want to run from the menu, select the appropriate variables, and click OK. SPSS reads through all your cases, performs the analysis, and presents you with the output as tables or graphs. Of course, you have to know which analysis to choose. For that information, see Parts 5 and 6.

You can instruct SPSS to draw graphs and charts directly from your data the same way you instruct it to do an analysis. You select the desired graph from the menu, assign variables to it, and click OK.

When you’re preparing SPSS to run an analysis or draw a graph, the OK button is unavailable until you’ve made all the choices necessary to produce output. Not only does SPSS require that you select a sufficient number of variables to produce output, but it also requires you to choose the right kinds of variables. If a categorical variable is required for a certain slot, SPSS won’t allow you to choose any other kind of variable. Whether the output makes sense is up to you and your data, but SPSS makes sure that the choices you make can be used to produce some kind of result.

NUMBERS NOT WORDS

SPSS works best with numbers. Whenever possible, try to have your SPSS data in the form of numbers. If you give SPSS names and descriptions, it’ll seem like they’re being processed by SPSS, but that’s because each name has been assigned a number. (Sneaky.) That’s why survey questions are written like this:

How do you feel about rhubarb? Select one answer:
A. I love it!
B. It’s okay.
C. I can take it or leave it.
D. I don’t care for it.
E. I hate it!

A number is assigned to each of the possible answers, and these numbers are fed through the statistical process. SPSS uses the numbers — not the words — so be careful about keeping all your words and numbers straight. We cover this subject in some detail in Chapter 4.

Remember: Keep accurate records describing your data, how you got the data, and what it means. SPSS can do all the calculations for you, but only you can decipher what it means. In The Hitchhiker’s Guide to the Galaxy, a computer the size of a planet crunched on a problem for generations and finally came out with the answer, 42. But the people tending the machine had no idea what the answer meant because they didn’t remember the question. They hadn’t kept track of their input. You must keep careful track of your data or you may later discover, for example, that what you’ve interpreted to be a simple increase is actually an increase in your rate of decrease. Oops!

MAKING SENSE OF ALL THOSE SPSS FILES

Input data and statistics are stored in files — different kinds of files. Some files contain numbers and definitions of numbers. Some files contain graphics. Some files contain both. Data files are easy to spot because they end with the .sav extension. Output files end with the .spv extension. Command Syntax files, with the optional programming language commands, end with .sps.

The examples in this book require the use of files that contain data configured to demonstrate capabilities of SPSS. Most of the files are in the same directory you used to install SPSS (installing SPSS also installs a number of data files ready to be loaded into SPSS and used for analysis). A few of the files used in the examples can be downloaded from the book’s companion website (www.dummies.com/go/spss).

All output from SPSS goes to the same place — a window named SPSS Statistics Viewer. This window displays the results of whatever you’ve done. After you’ve produced output, if you perform some action that produces more output, the new output is displayed in the same window. And almost anything you do produces output. Of course, you need to know how to interpret the output — SPSS will help you, and so does this book.

Getting Help When You Need It

You’re not alone. Some immediate help comes directly from the SPSS software package. More help can be found online. If you find yourself stumped, you can look for help in several places:

Topics: Choosing Help ⇒ Topics from the main window of the SPSS application is your gateway to immediate help. The help is somewhat terse, but it usually provides exactly the information you need. The information is in one large Help document, presented one page at a time. Choose Contents to select a heading from an extensive table of contents, choose Index to search for a heading by entering its name, or choose Search to enter a search string inside the body of the Help text.

In the Help directory, the titles in all uppercase are descriptions of Syntax language commands.
SPSS Support: Choose Help ⇒ Support to open a browser window for the support page at IBM. This area is primarily to report potential bugs or to check if anyone else has encountered the same bug. It's not the best option if you're struggling with a task on the first try.
SPSS Support Forums: Choose Help ⇒ SPSS Forums to open a browser showing the various support forums. IBM is putting a lot of resources into SPSS communities, which might have more activity over time than these forums.
PDF Documentation: Choose Help ⇒ Documentation in PDF Format if you want to access the many user’s guides for SPSS. This resource is online, but you can download them all to a folder on your machine if you want offline access to them.
Command Syntax Reference: Choose Help ⇒ Command Syntax Reference to display more than 2,000 pages of references to the Syntax language in your PDF viewer. The regular help topics, mentioned previously, provide a brief overview of each topic, but this document is more detailed.
Compatibility Report Tool: Choose Help ⇒ Compatibility Report Tool to answer a series of queries online to determine the compatibility of your software and hardware. If you're having trouble getting SPSS to install, access this information at www.ibm.com/software/reports/compatibility/clarity/index.html.
SPSS Statistics Community: Choose Help ⇒ IBM SPSS Predictive Analytics Community to visit a huge collection of IBM blogs and forums for every need. It will take a little time to get registered and settled in, but it's designed to be your free, go-to resource for the latest news and a chance to interact with other users. Be sure to sign up for the SPSS Stats group, in the IBM Data Science community. Hundreds of thousands of people are in this community, so it should be your first stop, before the support forums.