Chapter 8: Sources of Data

There are many ways in which nodes may be linked together. A person may have a set of individuals they consider friends in a friendship network. There may be other people they go to for advice. Other people might be kin relatives. Some people might be affiliated by common membership in organizations. There are six categories of relationships that we may wish to measure in a network.

Individual Evaluations: These are relations defined by the judgment of individual actors, such as friendship, trust, and respect. These relationships are most common in social network analysis.

Transaction: This relationship involves the transfer of some material resource, such as lending money and buying. Once the resource is passed from one actor to the next, the original actor no longer has that resource. This type of link is used primarily as a proxy for some other relationship. In order for one actor to give a resource to another actor, they must have some other type of relationship ranging from acquaintance to stronger affinity.

Transfer: This relationship involves the transfer of nonmaterial resources. Unlike transaction, both agents may possess the resource after transfer, such as communication, knowledge, or sexually transmitted disease.

Affiliation: Some agents may share affiliation with the same organization, live in the same house, or attend the same school with each other.

Formal: This relationship is formally established and does not require the opinion of the node. For example, a chain of command follows this relationship.

Kinship: Similar to the formal relationship, but with more defined meaning (i.e., sibling, parental, tribal).

All relationships can tell us something about the network and the how people within it associate or show common ties. Relationships may be binary (0 or 1), negative or positive (

), carry a specific value (exact number such as dollars, speed or distance, etc.). The absence of a link may also be significant in some networks, such as linking evidence to suspects in a criminal network. The social network researcher must determine the most appropriate way to design and measure the value of a relationship and how it should be defined, based on the research question.

There are important conceptual and theoretical differences between positive, negative and null ties. In a positive tie network, a node high in betweenness centrality is in a position to broker information and resources between actors in the network. This allows that highly central node to hold informal power. This is not necessarily true for a null network or negative tie network. Two nodes in the positive tie network with no link between them may represent a null tie, where there is no relationship between the actors. It could also represent a negative tie, where the actors dislike each other. In a negative tie network, betweenness centrality (and other node level measures) loses its conventional interpretation.

Example 8.1

Consider a network, where the links represent dislike instead of liking. A node on the path of dislike derives no informal power, positional advantage, or positional disadvantage. Keep in mind that in the negative tie network, two nodes that do not have a dyadic link may either like each other, or have no relationship at all.

Treatment of negative and null ties is an area of active research. For our purpose, we present the issue as a data collection concern. When collecting social network data, it is important to keep in mind that there are many ways in which two nodes might be related. Sometimes, the absence of relationship (null tie) is the interesting network to explore. Other times it is the positive tie network, negative tie network, or another type of relation completely. Two nodes that consider themselves friends do not necessarily trust one another. Two nodes that communicate for work related issues may or may not engage in gossip or discussions about their personal life. Data collection design must carefully consider the relationships of interest and determine the value network centrality measures may or may not provide in light of those relationships.

There are several ways to collect data regarding relationships in social networks. The key methods we discuss are questionnaires, interviews, observation, archival records, and e-mail.

Questionnaires are the most common method of collecting social network data. Subjects may be asked about many different relationships. There are several key experimental design considerations for questionnaires.

The first consideration is roster versus free recall. The roster design will provide the subject with a list of possible choices. The free recall method on the other hand requires the subject to remember who his friends are without any prompting from the questionnaire. For example, if you ask a subject to name their close friends, the roster method may provide them a list of people in their workplace, of whom they can choose others to link to. In this method, they are not able to select individuals that are not on the list. Research has shown that people can often forget close friends if they are not prompted and that free recall is more affected by how recently a subject has interacted with others than by the closeness of a relationship. Humans also tend to categorize acquaintances. Therefore, they are more likely to recall certain subgroups and leave out others based on their individual categorization. Some feel, however, that the roster method limits the potential relationships that a subject can name and therefore biases the study.

Example 8.2

McCulloh and Geraci (n.p.) conducted a study of post-traumatic stress disorder (PTSD) which included social network data on a US Army Infantry Brigade consisting of almost 1000 soldiers. Data was collected prior to the brigade's deployment to Afghanistan, 2 months into their deployment, and after they returned from deployment. The first questionnaire asked respondents to “list their close friends within the [military] unit.” Example responses were “Bubba,” “My wife,” “Big Joe,” “Smitty,” and other unusable responses. They were able to identify usable data from only 2 out of 12 companies in the brigade.

There were two challenges that the researchers faced. The first was the impracticality of providing respondents with a roster consisting of 1000 names on it. The second was the concern associated with prompting the respondent. Qualitatively, they determined that most relationships between soldiers were within the company level of the organization. Companies consisted of between 50 and 170 soldiers, depending upon the manning and purpose of the company. During the second iteration of data collection, respondents were provided company rosters. They were asked to place a check next to the individuals they considered to be a close friend. Usable data was collected on all companies surveyed. However, some respondents that completed surveys were not included on the roster, because they had been recently transferred into the unit after the roster was made. This provided another source of error in the study.

Roster and free recall both have strengths and limitations. A design decision must include considerations such as feasibility, size of the network, accuracy of the roster, among other issues. The third iteration of data collection used the same protocol as the second iteration.

Another consideration is free versus fixed choice. Free choice allows the subject to select as many potential others to link to. The fixed choice design, on the other hand, limits the number of others to some defined number. If you wanted to know close friends, the free design would allow the subject to name 3 or 30. The fixed choice design sets a number; for example 5; and then the subject must name exactly five friends. If they only have three close friends, they must arbitrarily choose two more. If they have six close friends, they must leave one out.

Yet another consideration is ratings versus rankings. Rankings require a subject to list the ranking of the strength of ties. For example, if you were investigating friendship with the ranking approach, you would ask the subject to rank order his friends from one to however many friends he has. The ratings approach measures the strength of relationship, allowing for ties. For example, if you investigate friendship with the rating approach, you would ask the subject to rate his friendship on a scale of 1 to 10. Some argue that if possible, you should always attempt to obtain rankings over ratings when possible, because there is higher resolution in rankings. However, if an individual has a few close friends, more regular friends, and some acquaintances, the ranking difference between a close friend and a regular friend would appear the same as the difference between close friends. Ratings, on the other hand, may not align with the culturally defined categories intended for collection.

Usually when we use a ratings approach, we prefer the Likert style scale, which is a five-point (0–4, 1–5) or seven-point scale (0–6, or 1–7). This scale allows a midpoint for exact ambivalence, extremes, but requires the user to push a little more to either the middle or extreme for other ratings. Research has found that 3- and 10-point scales can be more biased by people completing the survey than the five- or seven-point scales. However, it is often wise to use a scale that has been established in the literature in order to establish credibility in your research.

It is also very important to note that people can assess friendship differently. Strength of friendship of 6 to one person may seem like a 5 to someone else. It is wise to ask an objective question where possible for ratings. For example, instead of asking to rate their friendship on a scale of 1–7 where 7 is a close friend and 1 is an enemy, it may be more wise to ask, do you avoid this person, are you acquainted with this person, do you like this person, do you associate outside of work, do you go to each others house to socialize, do you go on vacation together, are you in an intimate relationship with this person. This makes the assessment of relationship more objective and consistent between nodes and dyads.

A different approach to questionnaires is interview. Interviews are used when questionnaires are impractical or the investigator lacks enough knowledge about the subject group to design appropriate questions. In some cases, interviews are used on the front end to provide insight into the nature and type of relationships and composition data that are important for the group under study. Interviewing is probably the most challenging method of data collection for the researcher. Effective interviewing requires the interviewer to gain rapport, convince people to open up, record accurate notes, begin and end the interview. These skills must be developed with practice.

Another data collection approach is direct observation. Observers need to be precise and consistent with their identification of relationships. One study at a military training event required the observer to record the number of statements/commands sent between members in a platoon chain of command during convoy operations. Early missions had very little communication where density was less than 0.3. Later missions had high communication with density greater than 0.8. The density of this network was highly correlated with the notional casualties incurred in the mission.

Archival records are another source of data. Many relationships can be defined for a set of records. A classic example is author co-citation networks. This establishes a link between individuals who have coauthored papers together. One famous mathematician, Paul Erdos published a large number of academic papers. Mathematicians like to track how many shared coauthors they are away from Paul Erdos. This is called an Erdos number. The lower the Erdos number, the more prestigious the mathematician.

A specific form of archival records is e-mail data. Most e-mail exchange servers maintain header information from e-mail traffic. This data consists of the TO, FROM, CC, BCC, SUBJECT, E-MAIL MESSAGE ID, and the date and time stamp. The header information provides necessary data to construct an e-mail communication network. E-mail communication does not necessarily reveal friendship, trust, or advice seeking. The volume of e-mail activity is usually more a function of an individual's role within the organization and their personal e-mail habits. Some people will e-mail the person sitting next to them. Others will walk to another room down a hallway to speak to someone face-to-face. However, the presence of an e-mail link between people demonstrates that there is at least some level of relationship between actors. Attribute defined subgroup analysis can help average actor e-mail behavior across groups and provide subgroup to subgroup communication behavior.

Chapter 8: Sources of Data

8.1 Network Sampling

Check on Learning

Answer

8.2 Measuring Links

8.3 Data Quality

8.4 Additional Ethnographic Data Collection Methods

8.5 Anonymity Issues

8.6 Summary

Exercises

References