Interpreting Your Results: The Discussion Section

Now that you have provided your readers with a thorough rendering of the mechanics of your study and the results of your statistical analyses, it is time to tell them what you think it all means. The Journal Article Reporting Standards (JARS; Appelbaum et al., 2018) recommends including several important elements. First, you must summarize what you found by telling readers which results were most important and which were secondary. You do not need to label them in this way (e.g., by having subsections called Primary Findings and Secondary Findings), although it would be OK to do so. You make the importance of your different results clear by when, how, and in what depth you choose to discuss them. In your fragrance labeling study, if your interest was primarily theoretical, you might spend considerable time detailing and interpreting what you found on the measures related to the fragrance’s pleasantness, but you might mention only briefly how the label affected buying intentions. If you had more applied motives for doing the study, the two types of measures might get equal attention. Some secondary findings (reported in the Results section) might hardly be mentioned, however compelled you may feel to explain everything.

Also, in the Discussion section you interpret your findings in light of the issues that motivated you to do the study in the first place, as described in your introduction. State how you think your study advances knowledge on these issues. For example, in the fragrance labeling study, what do you know about labeling effects that you did not know before? If you cannot answer this question, your study may not be of great interest to others.

Finally, cast an evaluative eye on your own work, taking a step back and looking at both its strengths and weaknesses. This permits you to propose the next steps that will advance the field.

It is typical for the Discussion section to begin with a recap of the rationale for the study. You should be able to return to the introduction, find the sentences you wrote there, and rephrase these at the beginning of the Discussion. For example, O’Neill, Vandenberg, DeJoy, and Wilson (2009) began their Discussion section with two simple sentences meant to remind readers of the broad agenda that motivated their research:

The goal of the current study was to extend organizational support theory through an examination of anger in the workplace. We first examined whether anger was an antecedent or a consequence of employees’ perceptions of conditions or events in the workplace. (p. 330)

Adank, Evans, Stuart-Smith, and Scott (2009) did the same:

The purpose of the present study was to determine the relative processing cost of comprehending speech in an unfamiliar native accent under adverse listening conditions. As this processing cost could not always be reliably estimated in quiet listening conditions . . ., we investigated the interaction between adverse listening conditions and sentences in an unfamiliar native accent in two experiments. (p. 527)

A quick summary serves as a simple reminder of your purpose. After the detailed Method and Results sections, it is good to draw your readers’ attention back to the big issues. That said, you should realize that readers will have different levels of interest in your article and will read different sections with different levels of care and attention. Do not feel insulted if I suggest that some readers will skip the Method and Results sections entirely (after you worked on them so hard!) or just skim them quickly and move directly from reading your introduction to reading your Discussion section. For these readers, the brief summary at the beginning of the Discussion section will sound a lot like what they read at the end of your introduction. That is OK. If you consider that different readers will read different chunks of your article, it makes sense to help keep them all on track.

Statement About the Primary Hypotheses

In your report, your Discussion section should include a statement of

support or nonsupport for all original hypotheses distinguished by primary and secondary hypotheses and
the implications of exploratory analyses, including
- substantive findings and
- potential error rates.

After the restatement of the goals for your research, JARS recommends that you provide a summary statement of your principal findings. Not every finding should be included, just the ones that support or refute your major hypotheses. These findings should be the focus of the discussion that follows. You can return to findings that help you explain your principal findings or that are of secondary interest later in the Discussion section. For example, O’Neill et al. (2009) immediately followed their two-sentence restatement of goals with a two-sentence recap of what they considered to be their major finding:

Although a negative reciprocal relationship between anger and POS [perceived organizational support] was hypothesized, only the negative relationship from POS to anger was supported. The relationship of POS and anger lends support to the social exchange perspective; anger seems to result from employees’ negative perceptions of workplace conditions rather than anger serving as a filter for negative perceptions of the organization, as suggested by affect-as-information and mood congruency theories. (p. 330)

Note that O’Neill et al. pointed out that they had hypothesized a reciprocal relationship but did not find it. They also said that their finding supported one theoretical interpretation of the data but not two others. This is a perfect way to set up the more detailed discussion that follows.

Fagan, Palkovitz, Roy, and Farrie (2009) were also relatively brief in summing up their findings:

The findings of this study are consistent with our hypothesis suggesting that risk and resilience factors of low-income, urban fathers who are unmarried and not residing with the mother and baby at birth, later nonresidence with the child, and mother–father relationship quality are significant components of an ecological model of paternal engagement with young children. (pp. 1398–1399)

This simple categorical statement of support is then followed by three pages of summary and interpretation of the findings.

Risen and Gilovich (2008) also began with a succinct summary of their findings:

Despite explicit knowledge that tempting fate does not change the likelihood of a broad range of negative outcomes, participants gave responses that reflected the intuitive belief that it does. Thus, even if they rationally recognized that there is no mechanism to make rain more likely when they leave behind an umbrella . . . participants reported that they thought these particular negative outcomes were indeed more likely following such actions. (p. 303)

However, two sentences are not the limit for how long the summary should be. Goldinger, He, and Papesh (2009) went into a bit more detail about their findings regarding other-race bias (ORB) in their opening summation:

Having observed the ORB in recognition, we, as our primary goal, sought to examine information-gathering behavior during learning. Considering first eye movements, widespread differences emerged between own- and cross-race face processing. In quantitative terms, when participants studied own-race faces, their eye fixations were brief and plentiful. Relative to cross-race trials, own-race trials elicited more fixations to facial features, briefer gaze times per fixation, more fixations to unique features, and fewer regressions. All these findings were reflected in an index of total distance traveled by the eyes during encoding, which we used for most analyses. The differences in eye movements were not an artifact of recognition accuracy: The same patterns were observed in subsets of learning trials leading only to eventual hits. In qualitative terms, participants favored different features across races (see Figures 2 and 8). (p. 1120)

Goldinger et al. (2009) chose to mention each of their primary measures in the summary, whereas authors of the earlier examples chose more general statements of support and nonsupport. Which approach you take should be determined by how consistent findings are across measures (can you quickly mention them in groups, or will you lose readers in the details?) and how much attention you intend to pay to each outcome measure separately.

These four examples present the happy news that the authors’ primary hypotheses largely were confirmed by the data. But the opening summary is not always a bed of roses; sometimes the manure is evident. Evers, Brouwers, and Tomic (2006) had to contend with a deeper level of nonsupport for their primary hypotheses. First, though, they also began with a brief restatement of purpose and research design:

In the present article, we examined the question of whether management coaching might be effective. To this end, we conducted a quasi-experiment in which we compared an experimental group of managers with a control group at Time 1 and Time 2. We measured hypothesized outcome expectations and self-efficacy beliefs on three domains of behavior, for example, acting in a balanced way, setting one’s own goals, and mindful living and working. (Evers et al., 2006, p. 179)

However, things did not turn out exactly as they had hoped:

We found a significant difference between the experimental and the control group on only outcome expectations and not on self-efficacy beliefs regarding the domain “acting in a balanced way.” (Evers et al., 2006, p. 179)

Thus, Evers et al. did not get exactly the results they had expected. Still, they provided a reasonable post hoc explanation for why this might have been the case; the intervention simply did not have time to effect all the changes they had expected:

These objectives clearly show that improving existing skills and developing new ones precede new beliefs, convictions, and judgments, which may explain the nonsignificance of differences between the experimental and the control groups with respect to the variable “to act in a balanced way.” In the short time between measuring the variable at Time 1 and Time 2, self-efficacy beliefs with respect to “acting in a balanced way” may not have developed yet. It may also be that managers have come to the conviction that some specific type of behavior will be advantageous but that they still experience some inner feelings of resistance toward getting rid of their old behavior. (Evers et al., 2006, p. 180)

Adank et al. (2009) delved a bit deeper into the nuances of their listening comprehension findings and the relationship of these findings to past research. Of their primary hypotheses, they wrote the following:

The results for the GE [Glasgow English] listener group in Experiment 1 showed that they made an equal number of errors and responded equally fast for both accents. The finding that the performance of the GE listeners was not affected by the accent of the speaker confirms that the processing delay for the GE sentences by the SE [Standard English] listener group was due to the relative unfamiliarity of the SE listeners with the Glaswegian accent. SE listeners thus benefited from their relative familiarity with SE. (p. 527)

In summary, your Discussion section should commence with a restatement of your goals and a summary of your findings that relate to those goals. You do not have to list every finding. However, JARS recommends that you present a complete picture of the findings that relate to your primary hypotheses, regardless of whether they were supportive. If you found unsupportive results, try to explain why.

Finally, the introductory paragraphs to your Discussion section can also briefly mention the results of any exploratory analyses you might have conducted. It would be OK for you to use these results to flesh out any theoretical explanations you have for why you obtained the results you did. Of course, because these were exploratory analyses, ones for which you had no explicit predictions, these results need to be labeled as such and readers alerted to interpret them with caution; they are more likely to have appeared by chance. It is also not unusual for authors to return to the results of exploratory analyses when they discuss directions for future research (discussed below) and suggest that the findings need replication and more precise testing in the next round of research.

Comparisons With the Work of Others

In your Discussion section, delineate similarities and differences between your results and the work of others.

Adank et al. (2009) highlighted the importance of an interaction between listening comprehension and amount of background noise. This led them directly into a discussion of how their work compared with the work of others:

No effects were found for processing the unfamiliar native accent in quiet. This result shows again that the cognitive processing cost cannot easily be estimated in quiet conditions (cf. Floccia et al., 2006). However, in both experiments, an interaction was found between the unfamiliar accent and moderately poor SNRs [signal-to-noise ratios] . . . listeners slow down considerably for these SNRs for the unfamiliar accent. A similar interaction has been found in experiments comparing the processing speed for synthetic versus natural speech (e.g., Pisoni et al., 1985). In conclusion, it seems justified to assume that processing an unfamiliar native accent in noise is delayed compared with processing a familiar native accent in noise. (Adank et al., p. 527)

Note that Adank et al. compared their results with the work of others in two different ways, both of which were consistent with their findings. Their first reference to another study is used to suggest that others’ research confirms their finding. Essentially, Adank et al. replicated this earlier result. The second reference to another study also suggests a replication, but here the earlier study varied from their study in an important way in that it compared processing speed between natural and synthetic speech rather than two accents of the same language.

Of course, your comparison with past research will not always indicate congruency between your results and the interpretations of other studies. Indeed, you might have undertaken your study to demonstrate that a past finding was fallacious or cannot be obtained under certain circumstances. Adank et al. (2009) pointed out an inconsistency of the former type between their results and past findings:

On the basis of Evans and Iverson’s results, one could hypothesize that familiarity with a native accent does not come from being exposed to it through the media alone but that interaction with speakers of that accent (or even adapting one’s own speech to that accent) is also required. However, our results do not provide support for this hypothesis, as GE listeners were equally fast for GE and SE. The GE listeners had been born and raised in Glasgow, and although they were highly familiar with SE through the media, they had had little experience of interacting with SE speakers on a regular basis. (p. 527)

Risen and Gilovich (2008) also pointed out how their findings contradicted some explanations for superstitious behavior:

Although most traditional accounts of superstition maintain that such beliefs exist because people lack certain cognitive capacities (Frazer, 1922; Levy-Bruhl, 1926; Piaget, 1929; Tylor, 1873), the work presented here adds to accumulating evidence of magical thinking on the part of people who, according to traditional accounts, should not hold such beliefs. (p. 303)

In summary, JARS recommends that you place your work in the context of earlier work. You can cite work that your study replicates and extends, but you should also include work with results or predictions at odds with your own. In this regard, if your fragrance labeling study showed that when the fragrance was labeled rose it was rated as more pleasant than when labeled manure, it would be perfectly appropriate for you to point out that this finding was in conflict with the assertion by Juliet Capulet.¹ When you do this, you should propose reasons why the contradictions may have occurred.

Interpretation of Results

In your Discussion section, your interpretation of the results should take into account

sources of potential bias and other threats to internal and statistical validity,
imprecision of measures,
overall number of tests or overlap among tests, and
adequacy of sample sizes and sampling validity.

JARS focuses its prescription for what to include in the interpretation of results on aspects of the research design and analyses that limit your ability to draw confident conclusions from your study. This is not because your Discussion section should focus on only the study’s limitations but because you may be tempted to disregard these and to promote your research by discussing only its strengths. Typically, this strategy does not work. Your manuscript will get a careful reading once it has been submitted for peer review. If you do not point out your study’s weaknesses, the peer reviewers will. Those who will read your work know every study has flaws; it is impossible to conduct a flawless study. By being transparent about what you know was not perfect, you convey to the reader a scientific posture that puts your article in a better light. By turning a critical eye on your own work, you instill confidence in your readers that you know what you are doing.

Strengths of the Study

That said, even though JARS focuses on weaknesses, do not forget to point out your study’s strengths. For example, Taylor and James (2009) began their discussion of biomarkers for substance dependence (SD) with a positive assertion about their findings:

SD is a common and costly disorder, and efforts to uncover its etiology are under way on several fronts. . . . Other work has shown promise for ERM [electrodermal response modulation] as an independent marker for SD, and the present study provides initial evidence of the possible specificity of ERM as a putative biomarker for SD. This could enhance the search for underlying genetic factors and neural pathways that are associated not with externalizing disorders generally but with SD more specifically. (p. 496)

Moller, Forbes-Jones, and Hightower (2008), who studied the effects of the age composition of classrooms on preschoolers’ cognitive, motor, and social skills, began their General Discussion section with a strong statement of what was best about their work:

This investigation represents a unique and important contribution to the literature on preschool classroom age composition in a number of respects. First, the study included a sample far larger than that in any previously conducted research. . . . Second, this research is among the first to use a well-validated assessment of early childhood development (i.e., the COR [Child Observation Record]) in a variety of domains (social, motor, and cognitive) and to include assessments at two time points (spaced approximately 6 months apart). (p. 748)

So “This study was the first . . . the biggest . . . the best” are all good ways to think about your study’s strengths. But neither Taylor and James (2009) nor Moller et al. (2008) stopped there. They also described some of the less positive aspects of their work.

Limitations or Weaknesses of the Study

If your study drew its inspiration from theories or problems that posited causal relationships among variables but your research design had some limitations in allowing such inferences, JARS recommends that this be acknowledged. For example, Moller et al. (2008) included the following acknowledgment in their discussion:

Another limitation of this research involves the correlational nature of these data. Empirical investigations that manipulate the age composition of preschool classrooms, with random assignment to condition, are warranted. (p. 750)

Fagan et al. (2009) put this same concern about internal validity in proximity to their strongest interpretation of their study:

Our findings suggest that as time passes, risk and resilience factors continue to play a significant role in relation to paternal engagement. Furthermore, our findings reveal that the patterns of interrelatedness between risk, resilience, and engagement (direct and mediating effects) are the same when the child is 3 years old as they are when the child is 1. Although causal relationships cannot be inferred from our analyses, our approach to measuring risk and resilience in fathers is an improvement to previous research. (p. 1399)

O’Neill et al. (2009) acknowledged a similar weakness in internal validity that was due to their research design:

The design of the study precludes drawing causal inferences at the individual level of analysis. Hence, a stronger test of these relationships is needed, particularly in light of potential reciprocity between POS and anger. (p. 330)

If your research design did not allow for strong causal inferences, it is also critical that you avoid the use of causal language in the interpretation of your results. For example, if you correlated the positivity of people’s reactions to perfume names with their evaluation of the fragrance itself, avoid using terms such as caused, produced, or affected that suggest your study uncovered a causal connection between labels and fragrance evaluations. It is easy to slip up on this matter because we all use causal language in everyday conversation without being consciously aware of doing so.

In addition to their caution about internal validity, O’Neill et al. (2009) alerted their readers to some concerns about their measurements:

A final limitation is that our anger measure did not capture feelings of anger specifically directed toward the organization or its members. In this way, our conceptual model is not perfectly aligned with the operational definitions of the variables. (p. 331)

Generalizability

In your Discussion section, your interpretation of your results should discuss the generalizability (external validity) of the findings, taking into account

the target population (sample validity) and
other contextual issues, such as the settings, measurement characteristics, time, or ecological validity

JARS addresses limitations related to internal validity, measurement, and statistics in one section. It gives a separate treatment to concerns about the generalization of findings. Evaluating the generalizability of your study’s findings involves you in at least three different assessments.

First, you need to consider the people or other units involved in the study in comparison with the larger target population they are meant to represent—the validity of the sample. I discussed the importance of circumscribing these boundaries in Chapter 3. Now it is time to address the issue head on and attempt to answer the questions I set out: Are the people in the study in some way a restricted subsample of the target population? If they are, do the restrictions suggest that the results pertain to some but not all members of the target population?

Second, generalization across people is only one domain you must consider. As noted in JARS, other contextual variations should be considered. If your study involved an experimental manipulation, you need to ask how the way the manipulation was operationalized in your study might be different from how it would be experienced in a natural setting. Was there something unique about the settings of your study that suggests similar or dissimilar results might be obtained in other settings? Could being in a psychology lab while choosing between two fragrances with different labels lead to a greater focus on the fragrance name than would making the same choice while standing at a perfume counter with dozens of fragrances in a large department store?

A third assessment involves considering whether the outcome variables you used in your study are a good representation of all the outcomes that might be of interest. For example, if you measured only subjects’ preference for a fragrance, does this preference generalize to buying intentions?

Our example studies provide instances in which the researchers grappled with each of these types of generalizations. Taylor and James (2009) provided the following cautions regarding the generalization of their findings across people:

Although the present study holds promise in helping move research into biological factors associated with SD forward, it had limitations that warrant mention. First, the PD [personality disorder]-only group was difficult to fill given the high comorbidity of PD with SD, and the results for that relatively small group should not be overinterpreted. Second, the sample comprised college students, and it is possible that a clinical sample with more extreme presentations of PD and SD could produce different results. (p. 497)

Here, Taylor and James expressed two concerns about generalization across people. First, they wanted readers to know that people with PD but without SD were rare—the sample was small, so be careful in drawing conclusions about them. Second, they pointed out that the study used only college students, so even the people who were identified as having SD or PD were probably less extreme on these characteristics than were people seeking clinical treatment, and thus, making generalizations to more extreme populations should be done with caution.

Amir et al. (2009) addressed the issues of generalization of their attention training intervention to combat social phobia across people and settings:

The finding that similar treatment outcomes were obtained within the current study at separate sites with differing demographic profiles, as well as in an independent laboratory . . ., supports the generalizability of the attention modification program across settings. (p. 969)

However, the stimuli used in their attention training also led them to appraise the generalizability of results:

Although the training stimuli used in the current study included faces conveying signs of disgust, there is evidence to suggest that disgust-relevant stimuli activate brain regions also implicated in the processing of other emotional stimuli such as fear. (Amir et al., 2009, p. 969)

Even Killeen, Sanabria, and Dolgov (2009), whose study focused on as basic a process as response conditioning and extinction and used pigeons as subjects, had to grapple with the issue of generalization, in this case across behaviors:

A limitation of the current analysis is its focus on one well-prepared response, appetitive key pecking in the pigeon. The relative importance of operant and respondent control will vary substantially depending on the response system studied. (p. 467)

As a final example, Fagan et al. (2009) pointed out a limitation of their study related to generalization across time:

Although the . . . data provide one of the most comprehensive views of this population over time, the operationalized measures provide snapshots of fathers at the time of measurement, and we are attempting to understand the processes and development of father–child relationships across time. (p. 1403)

The overall message to be taken from my discussion of the JARS recommendations about treatment of the limitations of your study is that you should not be afraid to state what these limitations are. Here, I have only sampled from the example studies; there were many more mentions of flaws and limits I did not reproduce. Still, all of these studies got published in top journals in their field. Again, adopting a critical posture toward your own work speaks well of you and instills confidence in readers that you understood what you were doing. The advancement of knowledge was your first priority.

Implications

In your Discussion section, your interpretation of the findings should discuss your study’s implications for future research, program development and implementation, and policy making.

Your final task in the Discussion section involves detailing what you think are the implications of your findings for theory, practice, policy, and future research. Which of these areas is emphasized most depends on the purposes of your study. However, it is not impossible that all four will deserve mention. For example, Moller et al. (2008) included a Theoretical Implications subsection in their Discussion section:

The findings from the present investigation strongly support the theory-based predictions offered by Piaget (1932) and others, who argued that interacting with peers who are close in age and ability will result in optimal learning. At the same time, these findings are not entirely inconsistent with predictions offered by Vygotsky (1930/1978) and others, who argued for mixed-age interaction principally on the basis of the merits implicit for younger children in these contexts. (p. 749)

Moller et al. (2008) were equally if not more interested in the practical implications of their work. They began their discussion by stating the strong message of their study for how classroom age grouping should be carried out:

We consistently observed a significant main effect at the classroom level for classroom age composition, which suggested that a wide range in children’s ages within a classroom (and high standard deviations in terms of age) was negatively related to development. . . . In this context, the present research strongly suggests that reconsideration of the issue of classroom age composition in early childhood education is warranted. (Moller et al., 2008, p. 749)

O’Neill et al. (2009) thought their findings had an important lesson for organizations’ policies and practices:

The good news is that if an organization successfully influences employees’ perceived organizational support, anger, withdrawal behaviors, accidents, and high-risk behaviors will decline. Anger reduction is particularly important for the organization, as highlighted by the costs in terms of employee turnover and loss of inventory. (p. 331)

Future Research

Researchers often joke that concluding a study with the call for more research is a requirement, lest the public lose sight of the value of their enterprise and the need to keep researchers employed. In fact, a study that solves a problem, whether theoretical or practical, once and for all is a rare occurrence indeed. The call for new research is always justified, and now you can cite JARS as giving you license to do so.

The limitations of a study will lead to suggestions for new research with improvements in design. So Moller et al. (2008) called for future experimental research, as noted earlier, because their study on classroom age grouping was correlational in design. Or the results of the study will suggest new questions that need answering. For example, Fagan et al. (2009) stated the agenda for future research in the form of questions in a subsection titled Future Research:

The study raises questions regarding the importance of early adaptations of fathers during the transition to fatherhood. Why do some men experience impending fatherhood as a wake-up call to improve their lives by reducing risk and increasing developmental resources, whereas others seem to eschew the development of personal resources that would position them to be more engaged fathers? What are the specific meanings of fathering for men in challenging circumstances, and what are the processes and conditions that allow some men to make positive adjustments to their lives and become involved fathers? What is the role of birth mothers in facilitating and discouraging men’s transitions within fathering? Are there interventions or policies that would increase the proportion of men who reduce risk and increase resilience during various transitions within fathering? (pp. 1403–1404)

Conclusions

Finally, you might consider ending your Discussion section with yet another recap of the study. For example, Tsaousides et al. (2009) finished their discussion with a conclusion section, which reads in its entirety as follows:

The present findings highlight the importance of domain-specific and general self-efficacy in perceptions of QoL [quality of life]. Both study hypotheses were supported, as both employment-related and general self-efficacy were associated with perceptions of QoL and need attainment, and both were better predictors than traditionally important contributors such as income and employment. These findings were consistent with Cicerone and Azulay (2007) in terms of the importance of self-efficacy on well-being, and with Levack et al. (2004), Opperman (2004), and Tsaousides et al. (2008) in terms of the importance of subjective self-appraisals of employment in evaluating quality of life post-TBI [traumatic brain injury]. The clinical implications for professionals working in the field of rehabilitation are that increasing confidence in work-related abilities and enhancing self-efficacy among individuals with TBI may facilitate return to work and will certainly have an impact on perceptions of well-being. (pp. 304–305)

This statement hits on almost all of the recommendations included in JARS, and it serves as a nice complement to the study’s abstract for readers who do not wish to delve too deeply into the details.

Discussion of Studies With Experimental Manipulations

If your study involved an experimental manipulation, your Discussion section should include

a summary of the results, taking into account the mechanism by which the manipulation or intervention was intended to work (causal pathways) or alternative mechanisms;
the success of and barriers to implementing the experimental manipulation;
the fidelity of how the manipulation was implemented;
generalizability (external validity) of the findings, taking into account
- the characteristics of the experimental manipulation,
- how and what outcomes were measured,
- the length of the follow-up if any,
- incentives provided to subjects, and
- compliance rates; and
the theoretical or practical significance of outcomes and the basis for these interpretations.

If your study involved an experimental manipulation, JARS makes some other recommendations regarding issues that should be addressed in the Discussion section. First, JARS suggests you discuss the mechanisms that mediate the relationship between cause and effect. For example, how is it that different labels lead to different ratings of the same fragrance? Does it trigger positive or negative memories? Does it alter the length of time subjects inhale?

Amir et al. (2009) found evidence for the following causal mechanisms in their study of attention training:

Although accumulating evidence suggests that computerized attention training procedures are efficacious in reducing symptoms of anxiety in treatment-seeking samples, little is known about the attentional mechanisms underlying clinical improvement. . . . The results suggested that the AMP [attention modification program] facilitated participants’ ability to disengage their attention from social threat cues from pre- to posttraining. (p. 970)

Thus, they suggested the AMP training made it easier for subjects to ignore cues of social threats (the mediating mechanism), which in turn led to reduced anxiety. However, they were careful to point out the limitations of their proposed explanation:

The results of the mediation analysis, however, should be interpreted with caution, given that change in the putative mediator (attention bias) and change in social anxiety symptoms were assessed at the same time, and temporal precedence was therefore not established. Thus, although causal inferences can be made about change in attention resulting from the AMP, we cannot make such claims about the relation between change in attention and symptom change. (Amir et al., 2009, p. 970)

In addition, they used this limitation to call for future research:

In future research, investigators should administer assessments of attention at multiple points during the course of treatment to better address these issues. (Amir et al., 2009, p. 970)

When your study involves the evaluation of an intervention, there can be unique aspects of the study related to the generalizability of the findings that need to be addressed in the Discussion section. For example, Amir et al. (2009) highlighted their use of a 4-month follow-up:

Assessments completed approximately 4 months after completion of the postassessment revealed that participants maintained symptom reduction after completing the training, suggesting that the beneficial effects of the AMP were enduring (see also Schmidt et al., 2009). However, follow-up data should be interpreted with caution because assessors and participants were no longer blind to participant condition. Future research should investigate the long-term impact of the attention training procedure, including an assessment of symptoms as well as attention bias. (p. 969)

Vadasy and Sanders (2008) provided a discussion of the limitations of their study that almost directly paralleled the recommendations of JARS. First, the summary:

The present study evaluated the direct and indirect effects of a supplemental, paraeducator-implemented repeated reading intervention (Quick Reads) with incidental word-level instruction for second and third graders with low fluency skill. Results show clearly that students benefited from this intervention in terms of word reading and fluency gains. Specifically, our models that tested for direct treatment effects indicated that tutored students had significantly higher pretest–posttest gains in word reading accuracy and fluency. (Vadasy & Sanders, 2008, pp. 281, 286)

Then, they addressed how their findings related to the work of others. Note how this excerpt also addresses some of the JARS recommendations regarding discussions of generalization (e.g., multiple outcomes, characteristics of the intervention including treatment fidelity), but here the authors pointed out the strengths of their study in this regard:

This study specifically addressed limitations in previous research on repeated reading interventions. First, students were randomly assigned to conditions. (Vadasy & Sanders, 2008, p. 287)

They used multiple outcomes:

Second, we considered multiple outcomes, including word reading accuracy and efficiency as well as fluency rate and comprehension outcomes. (Vadasy & Sanders, 2008, p. 287)

Their treatment was implemented with high fidelity, using the types of professionals likely to be used in real life and in the real-life setting:

Third, because this was an efficacy trial, the intervention was implemented with a high degree of fidelity by paraeducators who were potential typical end users, and in school settings that reflected routine practice conditions. (Vadasy & Sanders, 2008, p. 287)²

Their treatment was well specified:

Fourth, the 15-week intervention was considerably more intense than the repeated reading interventions described in many previous studies. Fifth, the particular repeated reading intervention, Quick Reads, is unusually well specified in terms of text features and reading procedures often hypothesized to influence fluency outcomes. (Vadasy & Sanders, 2008, p. 287)

Finally, they discussed some limitations. First, for theoretical interpretation, the intervention might not have involved reading instruction only:

Findings from this study should be considered in light of several limitations. First, although the intervention used in this study was primarily characterized as repeated reading, a small portion (up to 5 min) of each of the tutoring sessions included incidental alphabetic instruction and word-level scaffolding. (Vadasy & Sanders, 2008, p. 287)

The characteristics of students may have been unique:

Second, students entered this study with a wide range of pretest fluency levels that reflected teacher referral patterns; nevertheless, students ranged from the 10th to the 60th percentiles on PRF [passage reading fluency] performance, similar to students served in repeated reading programs. (Vadasy & Sanders, 2008, p. 287)

Note that Vadasy and Sanders here raised a possible shortcoming of their study that they then provided evidence to refute. This can be an important strategy for you to use: Think of concerns that might occur to your readers, raise them yourself, and provide your assessment of whether they are legitimate and why.

The classroom observations may have limited the researchers’ ability to describe the causal mediating mechanisms:

Third, we observed classroom instruction only twice during the intervention. As others have demonstrated . . . dimensions of classroom instruction that our coding system did not capture, such as individual student engagement or quality of instruction . . ., may have influenced student outcomes. (Vadasy & Sanders, 2008, p. 287)

Some teachers refused to participate, so some results may apply only to teachers who were open to full participation:

Fourth, our findings on classroom literacy instruction are based on data excluding six teachers (and their students). It is possible that these teachers’ refusal to participate reflects a systematic difference in their literacy instruction; however, outcomes of students within these classrooms did not reliably differ from outcomes of students whose teachers were observed. (Vadasy & Sanders, 2008, p. 287)

Finally, some important variables may have gone unmeasured:

A final limitation of this study is that many variables expected to contribute to comprehension gains were not accounted for in this study, including vocabulary knowledge, strategy skills, and general language skills. (Vadasy & Sanders, 2008, p. 287)

It is especially important when reporting the results of an evaluation of an intervention to delve into the clinical or practical significance of the findings. In the case of Amir et al.’s (2009) evaluation, the practical implications were most evident because of the short duration of the treatment:

These findings speak to the utility of the AMP, given the brevity of the intervention (eight sessions over 4 weeks, 20 min each) and absence of therapist contact. Although empirically supported treatments for SP [social phobia] already exist, many people do not have access to therapists trained in CBT [cognitive–behavioral therapy], and others opt not to take medication for their symptoms. . . . The ease of delivery of the current intervention suggests that the AMP may serve as a transportable and widely accessible treatment for individuals with SP who are unable to or choose not to access existing treatments. (p. 970)

Likewise, Vadasy and Sanders (2008) drew some clear practice implications for reading instruction:

Findings support clear benefits from the opportunities students had to engage in oral reading practice during the classroom reading block. When students read aloud, teachers have opportunities to detect student difficulties, including poor prosody, decoding errors, and limited comprehension reflected in dysfluent reading. Teachers can use this information to adjust instruction for individual students and provide effective corrections and scaffolding. (p. 287)

Discussion of Studies With No Experimental Manipulation

If your study contained no experimental manipulation, your Discussion section should describe the potential limitations of the study. As relevant, describe

the possibility of misclassification,
any unmeasured confounding, and
changing eligibility criteria over time.

Each of the examples of study reports with nonexperimental designs includes a discussion of potential limitations of the results. In fact, three of them have a subsection in the Discussion section titled Limitations. Although the three potential limitations mentioned in JARS are just examples of limitations that might be described, they are broad issues that should generally be considered when interpreting the results of a nonexperimental study. In addition to these, it is good practice to mention in the Discussion the fact that causal interpretations of nonexperimental data must always be done with caution, if at all. To be more complete, the STROBE guidelines for the Discussion section of an observational study make this point nicely:

The heart of the discussion section is the interpretation of a study’s results. Over-interpretation is common and human: even when we try hard to give an objective assessment, reviewers often rightly point out that we went too far in some respects. When interpreting results, authors should consider the nature of the study on the discovery to verification continuum and potential sources of bias, including loss to follow-up and non-participation. . . . Due consideration should be given to confounding . . ., the results of relevant sensitivity analyses, and to the issue of multiplicity and subgroup analyses. . . . Authors should also consider residual confounding due to unmeasured variables or imprecise measurement of confounders. (Vandenbroucke et al., 2014, p. 1519)

As I mentioned when I discussed the reporting of methods of measurement, misclassification of people into the wrong group is a problem with nonexperimental designs that deserves careful attention and should be returned to in the Discussion. If the process of assignment to conditions was accomplished through a procedure other than random assignment, you should consider in the Discussion (a) how the people in different conditions might have differed in ways other than your classification variable (which you identified in the methods), (b) whether participants switched groups in the course of the experiment (called unintended crossover), and (c) whether these differences might have been related to the outcome variable of interest. In the hypothetical example of workplace support I used in the discussion of methods, following the JARS recommendation I suggested you would need to take special care to determine that absenteeism, turnover, and perceived organizational support were measured in comparable ways in the two intact groups (the retail stores). In the Discussion section, you would need to revisit any differences you discussed in the Method section and consider whether these might be plausible rival hypotheses to the conclusion you want to make. Otherwise, readers will not be able to assess the importance of any potential sources of bias in the measures.

Another example of discussing the possibility of misclassification was provided in Fagan et al.’s (2009) study of risk and resilience in nonresident fathers’ engagement with their child. In their Limitations section, they noted differences in how risk and resilience were measured:

We also note measurement issues in regard to the risk and resilience variables reflecting temporality and centrality. Specifically, the risk index taps items that reflect ongoing or past behaviors that deleteriously influence men’s ability to actively engage as fathers, such as incarceration, substance use, and mental health problems. In contrast, more of the items on the resilience scale represent factors with the potential to positively position men for fathering. (p. 1403)

Although the researchers do not say so explicitly, the concern is that the trustworthiness of the two measures might be different and that this could lead to more misclassification of fathers for one construct of interest than the other.

You also need to discuss the possibility that your measured variables were confounded, or confused, with other variables. For example, Fagan et al. (2009) wrote,

The prenatal involvement measure is limited to five overlapping items pooled across mother and father reports of three very broad indicators of paternal involvement. In actuality, it is not clear whether the prenatal measure is an indicator of father engagement with the child, the paternal role, the mother, or a combination of these. (p. 1403)

In this way the researchers cautioned that their measure may have confounded multiple aspects of parenting (i.e., the father’s engagement, the father’s role, and characteristics of the mother) into a single score and that it was not clear which might be the specific aspect of parenting that was related to the other variables in the study.

Finally, JARS suggests you address whether the criteria for eligibility in the study changed over time. For example, the Taylor and James (2009) study of differences in electrodermal response modulation to aversive stimuli in people who are and are not substance dependent used subjects from two studies conducted between 2001 and 2006. Although the measure of substance disorder was the same in both studies, the measure of personality disorder was different:

Trained clinical graduate students administered the Structured Clinical Interview for DSM–IV Axis I Disorders (SCID–I; First, Spitzer, Gibbon, & Williams, 1995) to assess lifetime occurrence of substance use disorders for alcohol, cannabis, sedatives, stimulants, cocaine, opioids, hallucinogens, and “other” substances (e.g., inhalants). Antisocial PD and borderline PD were assessed with the Structured Interview for DSM–IV Personality (SIDP–IV; Pfohl, Blum, & Zimmerman, 1994) in the earlier study and the Structured Clinical Interview for DSM–IV Axis II Personality Disorders (SCID–II; First et al., 1997) in the later study. These interviews are functionally equivalent and assess the same criteria. (Taylor & James, 2009, p. 494)

Thus, the researchers assured us that although the criteria used to assess whether participants had personality disorders (clinical interviews) were different in the two studies, they were “functionally equivalent.” This is important because changes in diagnostic criteria (e.g., duration of symptoms, number of symptoms present) could change the types of people diagnosed with psychiatric disorders. In experimental studies, subjects are assigned to different groups at the same time; subsequent changes in diagnostic criteria could affect external validity (generalizability) but not internal validity (i.e., bias). In nonexperimental studies, differences over time could affect both external validity and internal validity. Taylor and James (2009) addressed this issue in their Method section, but it could also have been addressed in the Discussion.

If you are interested in examples of all of the items that should be included in reports of nonexperimental designs, these can be found in the article by Vandenbroucke et al. (2014), which contains examples and explanations of the STROBE guidelines.

Discussion of Studies With Structural Equation Modeling

In your Discussion section, if your research includes the use of a structural equation model, your discussion should

summarize any modifications to the original model and the bases, theoretical or statistical, for doing so;
address the issue of equivalent models that fit the same data just as well as retained models or alternative but nonequivalent models that explain the data nearly as well as retained models; and
justify the preference for retained models over equivalent or near-equivalent versions.

If you used a structural equation model in your study, you need to discuss the explanatory value of the model or models you tested, how they compared to one another, and why you favored one or more models over others. If you altered models because of what the data revealed, you need to state this and what the implications of the changes might be.

Discussion of Clinical Trials

If you are reporting a clinical trial, your report should describe how the study advances knowledge about the intervention, clinical problem, and population.

As described in Chapter 6, a clinical trial is a study that evaluates the effects of one or more health-related interventions on health outcomes. It involves assigning individuals or groups to different conditions. Thus, the Vadasy and Sanders (2008) study on reading instruction would be considered a controlled trial that compares different approaches to reading instruction, the Amir et al. (2009) study a controlled trial of attention training, the Norman et al. (2008) study an evaluation of a program for smoking prevention in adolescents, and the Vinnars et al. (2009) study an examination of different psychotherapies for personality disorders. All would be considered clinical trials.

For example, Norman et al.’s (2008) smoking intervention proved to be generally successful:

This study demonstrates that an intervention designed around a website supported by additional motivational components can be integrated into schools to support smoking cessation and prevention in an engaging manner. Through the use of multiple learning channels, the Smoking Zine was able to significantly reduce the likelihood that an adolescent would take up smoking over 6 months when compared with similar students in the control condition, especially with regard to adoption of heavy smoking. (p. 807)

However, the way the researchers chose to introduce the intervention might have placed limits on their ability to generalize to other means of implementation:

School-based trials typically randomize classes; we chose to randomize participants at the individual level because of the personalized nature of the intervention. Doing so introduced the possibility that students would share lessons learned with their peers. . . . Integrating the intervention into regular classroom activities potentially reduced its novelty and the likelihood of extramural discussion. (Norman et al., 2008, p. 807)

Norman et al.’s (2008) first concern, often referred to as treatment diffusion, would reduce the effect of the intervention because students in the treatment condition shared what they learned with students in the control condition. Their second concern would limit the impact of the treatment. In addition, Norman et al. pointed out a concern about compliance rates:

Another area worthy of consideration is the fact that fewer smokers completed the 6-month trial compared with nonsmokers. A potential reason could be attributed to complications arising from increased engagement in risk behaviors among the smokers. . . . These risk behaviors may have contributed to an increased absenteeism rate at school, making follow-up more difficult. (pp. 807–808)

Perhaps one of the most interesting discussions of limitations was by Vinnars et al. (2009). They found that their experimental treatment for people with personality disorder did not outperform the control treatment:

This study explored the extent to which manualized psychodynamic psychotherapy was superior to ordinary clinical treatment as conducted in the community in Scandinavia in improving maladaptive personality functioning in patients with any DSM–IV [Diagnostic and Statistical Manual of Mental Disorders, fourth edition; American Psychiatric Association, 1994] PD diagnosis. We did not find any significant difference between the two treatments, suggesting perhaps that the format of treatment (i.e., whether it was manualized psychodynamic or ordinary clinical treatment) did not seem to differentially affect change in personality. The one exception was for neuroticism, and this was only during follow-up. (p. 370)

So what do we learn from this study?

In summary, the results from these studies indicate that improvement in interpersonal problems is possible. However, it is hard to predict in advance what specific treatment, what treatment length, or for what specific sample of patients these interpersonal problems will improve. (Vinnars et al., 2009, p. 372)

Ordering Material in the Discussion

One of the more difficult chores I had in writing this book was taking apart the Discussion sections of the example reports so that I could illustrate the items in JARS; it was difficult because the various elements of each Discussion were intertwined with one another. Other than the placement of the restatement of goals and summary of findings, what elements of a Discussion section should go where is a matter of the individual authors’ preferences and the unique demands of the topic and findings. Therefore, do not get hung up on preparing your Discussion in the same sequence I have here. Just be sure to include discussion of the elements called for in JARS. Where in the Discussion section you cover each and in what depth should be dictated by what order will help your readers make sense of your findings and make your message clear. Clarity and completeness rule the day.

There is one more table to be considered. This table involves the reporting of research syntheses and meta-analyses, which I discuss in Chapter 8.