THIS CHAPTER EXAMINES the influence of international monitors on the quality of individual elections. Using quantitative data to examine the quality of elections provides a far greater breadth of analysis than case studies alone can accomplish. However, using quantitative data to explore the effects of monitors on a given election is complicated. As discussed in Chapter 2, whether an election is monitored depends both on the organizations’ interest in observing an election and on domestic willingness to host observers. Both of these factors are likely to be related to the expected quality of an election. This is the classic problem with analyzing data on any form of nonrandom intervention. If the anticipated quality of an election influences whether monitors are present, then monitors may not influence quality at all, but merely respond to it. That is, monitors may simply go to elections that are more likely to improve. Conversely, if elections do not improve, it may be because monitors go to particularly difficult countries that are less likely to improve.
This chapter begins with a discussion of the measures used to evaluate election quality. It then uses a mix of approaches to explore the data. First it presents some descriptive overviews. It then applies some of the most cutting-edge statistical techniques to reduce the bias introduced by the selection problem discussed above and identify the effect of monitors on election quality. The chapter ends by discussing the results in greater depth. Appendix D contains significant supplementary data and discussion about the statistical analysis.
The analysis in Chapter 4 relied on the assessments of individual election observation missions. However, the present analysis cannot use this measure for several reasons. First, some elections have assessments from multiple organizations, making it unclear which assessment to use. Second, nonmonitored elections have no assessment at all. Third, as shown in Chapter 4, factors other than election quality may bias the monitors’ assessment. The measure of election quality for this analysis should not consider how an organization chooses to represent its findings to the press or others in the immediate aftermath of the election. Rather, the goal is to find as accurate as possible a measure of how good the election actually was. For this analysis, an election quality measure was therefore derived from the annual U.S. State Department Reports of Human Rights Practices, which discuss the quality of elections as part of the consideration of the rights of citizens to choose their governments. This data is available for 1,204 elections.
The measure captures whether the State Department report, notwithstanding the level of problems, considered the election acceptable. Thus it is possible for an election to have a moderate level of problems that raise considerable concerns, yet for the State Department to conclude that overall the election was nonetheless acceptable, or, conversely, for the election itself to proceed with few problems, yet for the State Department to consider it unacceptable, perhaps because of major flaws in the legal framework. Appendix A has fuller description of the measure, and Table 7.1 shows the distribution of the variable.
The second measure uses the same source, but considers not only whether the State Department considered the election acceptable, but also the level of election problems discussed in the report. The variable is thus a combination of acceptability and level of problems. Table 7.2 shows the different levels and the distribution of the variable.
Both the measure of overall election quality and the problems measure correlate well with other standard democracy measures to raise confidence in their reliability, yet they still differ enough to suggest they capture something other than broad democracy scores.1
Because these measures are based on reports produced by a U.S. agency, they may contain some political bias as discussed further in Appendix A. However, research has found that U.S. State Department reports have obtained considerable independence over the years and criticism of U.S. allies is quite common.2 Thus, rather than political bias, a bigger concern is whether some cheating is systematically overlooked. However, this is more likely to occur when monitors are absent than when they are present. Thus monitored elections are likely to be perceived as more problematic, making it harder, not easier, to show a positive relationship between monitors and election quality.
TABLE 7.1
Distribution of election quality
TABLE 7.2
Coding and distribution of the “Problems” variable
* Some elections were left as “missing” because their order in the ranking system was unclear.
To provide an alternative check on the election quality measure, the study relied on the simple proposition that politicians who cheat less should keep power less often. Thus, a measure was created to capture whether the incumbent party keeps power in an election. The rules and sources for creating the variable are detailed in Appendix A.
Of the 1,324 elections in the data, the variable is missing in 41, or about 3 percent, of cases. Turnover occurs in 336, or about 25 percent of all elections, which means, of course, that incumbents retain power in nearly three-quarters of all elections.
Using turnover as a measure of election quality has the benefit of objectivity, but the drawback is that it may miss a great deal of reductions in fraud. Turnover is only a very indirect measure of fraud, and given that fraud can be reduced without power necessarily changing hands, it is a rather inexact way to examine whether monitors influence fraud, because decreases in fraud could well go undetected in this measure. In other words, whereas incumbents almost never lose power after a fraudulent election (the data contain only five such cases, including, for example, the Philippines in 1986), incumbents may well keep power in a clean election. Failure to find statistically significant effects related to turnover is therefore quite possible, even if monitors do succeed in reducing fraud. Conversely, finding a statistical relationship provides quite strong evidence that monitors reduce fraud.
Does a cursory examination of the data suggest that the presence of monitors deters cheating and leads to a better quality of elections? For simplicity, the following section focuses just on the overall election quality measure and turnover. Figures 7.1 and 7.2 display the data for five different samples that get progressively more restrictive.
The results from the full sample show the distribution in the overall election quality is very similar for both monitored and nonmonitored elections. About 67 percent of monitored elections were acceptable compared with about 65 percent of nonmonitored elections. For turnover the difference is more discernable, with incumbents in monitored elections about 10 percentage points less likely to keep power than incumbents in nonmonitored elections.
However, the full sample really compares apples and oranges. As noted in the previous chapter, monitors should not have any effect on elections in single-party states or in fully established democracies, nor do they tend to go to these elections. The second set of columns in each figure, which excludes single-party elections, shows that the elections that are not monitored do tend to be acceptable more often, although turnover rates are still lower due to the relatively high rates of incumbency in established democracies. The third set of columns also excludes democratic countries. In this sample, monitored elections are acceptable and produce turnover more often. Indeed, when formal single-party states and fully established democracies are excluded, then incumbents in monitored elections lost power twice as often.
Figure 7.1: Percent of acceptable elections for monitored and nonmonitored subsamples
Note that although this figure displays only the percent of elections that are acceptable, the variable has three outcomes: acceptable, ambiguous, or unacceptable.
Still, the comparison groups may be hiding important factors. For example, the previous chapter also noted that monitors might be even more likely to influence the quality of elections in transition states. However, if the differences between monitored and nonmonitored elections are entirely because monitors attend special elections, then the argument that they improve elections is possibly fully explained by the nature of these special elections, rather than by the presence of monitors. Thus, it may be enlightening to narrow even further the subset of elections being compared. The fourth set of columns in Figures 7.1 and 7.2 excludes any special elections (that is, elections that occurred as the first election after a coup or after a conflict, or any first multiparty election3). By excluding all these special types of elections that monitors are more likely to attend and where changes in the conduct of election may be explained by numerous factors, the groups of monitored and nonmonitored elections become more comparable, and selection bias is reduced. The columns show that monitored elections still were acceptable more often, and politicians in monitored elections lost power more often.
Figure 7.2: Turnover rates for monitored and nonmonitored subsamples
The really interesting question, of course, is whether the presence of monitors increases turnover or improves election quality. Therefore, it may be useful only to look at elections in countries where the prior election was bad or there was no turnover. To do so, the last set of columns in Figures 7.1 and 7.2 excludes elections where the previous election was acceptable (7.1) or the previous election produced a turnover (7.2). This means that the columns show elections that improved in terms of quality or turnover. Again, the tables show more frequent improvements when monitors were present both in the acceptability of the elections and in the turnover.
Table 7.3 displays the underlying data for the 5th set of columns in Figure 7.1, showing not only the acceptable elections, but also the ambiguous and unacceptable elections.
Perhaps surprisingly, the differences between the third and fourth sets of columns in Figures 7.1 and 7.2 are actually small. This suggests that discarding the special elections does not have much effect. Thus, whether an election is the first after a coup or conflict or whether it is a first multiparty election may not influence the quality of the election or the influence of monitors greatly.
TABLE 7.3
Distribution of elections in terms of quality and monitoring*
Note: L = Legislative, P = Presidential, B = Both.
*Excluding single-party states and elections in countries rated “Free” by Freedom House in the year before the election, post-conflict elections, post-coup elections and first multiparty elections, and elections where the prior election was acceptable.
**Note that although the U.S. State Department Reports noted them as acceptable, some elections, such as the Nepal 1986 legislative election, were not very competitive. See Appendix A for more discussion of the QED data.
This question deserves a little further exploration given that countries in transition were expected to be more susceptible to respond to the presence of monitors. Figure 7.3 compares turnover rates in four different types of elections: first multiparty elections, post-coup elections, post-conflict elections, and other elections. The second light-gray column from the left shows that monitored first multiparty elections produce a turnover 41 percent of the time, which is significantly higher than the 15 percent rate for nonmonitored first multiparty elections. This improvement in the turnover rate associated with monitors is also higher than in other elections, represented by the first set of columns. However, post-coup elections and post-conflict elections, although generally displaying higher turnover, do not seem to benefit as much from the presence of monitors—although one should be cautious about drawing conclusions based on the lower numbers in those categories. That said, the figure suggests that monitors may not really be more effective in post-conflict or post-coup elections than in other elections, but that they may actually be more effective in first multiparty elections. It is important to recall, however, that these simple correlations do not establish causality, so although turnover is higher when first multiparty elections are monitored, this suggests, but does not prove, that monitors are increasing the turnover rate.
Figure 7.3: Turnover in different types of elections
Excluding post-conflict elections, and post-coup elections, as well as elections in single-party states or countries rated “Free” by Freedom House in the year before the election.
The previous chapter also argued that more credible monitoring organizations should be more effective at improving elections. A final way of looking at the data, therefore, is to consider differences between monitoring organizations. As Chapters 3 and 4 showed, some organizations are more likely to voice disapproval when there are problems. If monitoring deters cheating, then more critical monitors should have a greater deterrent effect. In Chapter 3, Figure 3.2 showed how often different organizations criticized elections that were highly problematic either in the view of other monitoring organizations, or in the view of the U.S. State Department. Based on this, an indicator of monitor quality was created for organizations that criticized highly problematic elections at least 50 percent of the time. This coding rule is arbitrary, but clear. These organizations were the Carter Center (CC), the National Democratic Institute (NDI), the Asian Network for Free Elections (ANFREL), the International Republican Institute (IRI), the Organization for Security and Co-operation in Europe (OSCE), the European Parliament (EP), the EU Commission, and the Organization of American States (OAS). Consistent with the claim that credible organizations should be more effective, Figure 7.4 shows that elections monitored by the most credible organizations are indeed better and have greater turnover.
Figure 7.4: Monitor types, election quality,* and turnover
Excluding single-party states and countries rated “Free” by Freedom House in the year before the election, post-conflict elections, post-coup elections, and first multiparty elections.
*Note that although this figure displays only the percent of elections that are acceptable, the variable has three outcomes: acceptable, ambiguous, or unacceptable.
The discussions and data exploration above suggest that monitored elections are more likely to be acceptable and produce turnover more often, that first multiparty elections may be more likely to respond to monitors, and that the quality of the monitoring organizations matters. Incidentally, these patterns change little if the analysis is further restricted to elections after 1990.
The above analysis is based purely on a descriptive examination of the data. That is useful because it shows the real frequencies of various events in the data and reflects actual experiences throughout the years. Narrowing the sample reduces some problems inherent in basic comparisons, but it still does not fully address the selection problem. Furthermore, descriptive analysis cannot take multiple factors into consideration simultaneously. Multivariate analysis provides a way to do this, and to examine the contribution made by an individual variable, such as monitoring, to an observed outcome conditional on the values of other variables. The rest of this chapter presents a multivariate analysis of international election monitoring and discusses the findings.
Chapter 2 discussed a series of factors that influence which elections are monitored. In general, monitors go to countries with more corruption and a history of problematic and fraudulent elections, but they rarely waste their resources on single-party states. Rather, monitors visit many first multiparty elections and go to countries with less stable governments. The fact that many of the factors that determine where monitors go also influence the quality of elections means that generalized linear models may produce biased estimates. Analysts sometimes attempt to correct this selection problem by including control variables, but this may actually increase bias in the coefficient estimate for the treatment effect in some cases,4 and the results can be highly sensitive to model specification.5 Another common solution is to use standard selection models. However, they may also not provide the ideal solution, because they add new distributional assumptions to the modeling effort about the covariance structure between the variables that select an observation into treatment and the effect of the variables on the outcome.6 In reality these assumptions are very difficult to satisfy and verify. More recent methodological research has therefore focused on ways to reduce distributional assumptions when dealing with selection bias, rendering estimates less model dependent. This has led to a great deal of interest in matching techniques, which are employed for this analysis and discussed in greater detail in Appendix D.
The analysis in this chapter uses a genetic matching procedure to select treatment and control observations such that these two groups appear observationally similar in terms of control variable values.7 Genetic matching has advantages over other commonly used matching techniques, such as nearest-neighbor matching based on propensity scores, because it automates the search for the best possible balance between the treatment and control groups. The matching was done using all of the available control variables that had an effect on election outcomes conditional on the treatment state.8 This means that variables that were predictors of election monitoring only were included in the matching if those variables also were predictors of election quality or turnover. Accordingly, the matched variables, which are discussed further in Appendix A,9 were:
1. the level of corruption in the year before the election
2. the level of democracy in the year before the election
3. whether the country was under a democracy-related sanction in the year of the election
4. whether the election was a first multiparty election
5. the natural log of the level of foreign aid to the country in the year before the election
6. whether the election was the first after a coup
7. the government’s stability the year before the election (for the analysis of turnover)
8. the quality of the previous election (for the analysis of election quality and problems)
In addition, year was used as a control variable to address time trends in the data, but it was not used for the matching itself, as this would be overly limiting. Several other variables that correlated with monitoring did not correlate with outcomes and they were therefore omitted to avoid biasing the estimates.
The matching was subjected to several tests as discussed in Appendix D. Furthermore, to decrease confirmation bias—the tendency of investigators to analyze data selectively to confirm of their hypotheses—an outside consultant assisted in the analysis.10 The matching results were very good, meaning that the genetic matching procedure was able to produce samples of monitored and nonmonitored elections that were very similar. This is evident in the low standardized balance scores reported in the tables below. After matching, standard binary or ordered logistical regression analysis was used to make inferences about the effect of monitoring on election outcomes, as discussed above.
When estimating the effect of monitoring on election outcomes, it is advisable to first remove the cases where theory does not predict a monitoring effect. The analysis therefore excludes single-party states and countries considered “free” by Freedom House.11 Relying on the Freedom House data permits inclusion of more countries, as the variable contains less missing data for smaller countries in particular.
The post-matching estimation method used is logistic regression analysis. In statistical analysis, logistic regression is used to predict probabilities that an event will occur. With binary outcome variables such as turnover, regular logit models are used. When the outcome variables are ordered, as are the variables capturing election quality and problems, ordered logit models are used.12
Table D.1 in Appendix D presents the results of the multivariate analysis performed after the matching. These results align well with the patterns revealed by the earlier descriptive analysis. The presence of monitors is positively associated with election quality, level of problems, and turnover. This means that when monitors are present, the models predict that it is more likely that the election will be considered acceptable, that it will have fewer problems, and that it will produce a turnover in power.
Although the matching process reduces the likelihood of biased estimators, the analysis still does not prove definitively that the presence of monitors causes elections to improve. However, this positive association across all the different measures of election quality does provide considerable support for the hypothesis that monitors improve election quality and increase turnover. Appendix D provides additional tests using different democracy variables and different ways to limit the subset of observations used in the analysis to examine how robust the results are. These additional tests show that the results are fairly consistent, but that there are some subset specifications where the monitoring variable is not significant. Appendix D discusses this further.
Only a few other variables in the models are statistically significant, and these make sense. As expected, the lagged dependent variables are significant. This simply means that the quality of the previous election is likely to play a role in the quality of an election. Furthermore, first multiparty elections are more likely to be acceptable and have fewer problems, but it is not clear that they produce a turnover in power more often. The significance of the Freedom House democracy variable shows that, as expected, elections in partly free countries are also better than those in countries that are not free.
The coefficients in logit models are difficult to interpret and therefore it is helpful to look at the effect that a change in monitoring has on the predicted probability of the outcome. For illustrative purposes, the predictions are based on the scenario of the values shown in the table below Figure 7.5. The figure shows the probability that an election will be acceptable, ambiguous, or unacceptable. Importantly, the estimates are based on elections where the last election was unacceptable, thus in essence showing the predicted probability of improvement depending on the presence of monitors. Given that these are all elections where the prior election was unacceptable, it is no surprise that the most likely outcome is an election that is once again unacceptable. However, monitored elections differ considerably. The set of columns on the left in Figure 7.5 shows that in countries that are not free the predicted probability of an acceptable election is only about 11 percent without monitoring, but it is about 26 percent with monitoring. Similarly, the fourth set of columns shows that for partly free countries where the last election was bad, the predicted probability of an acceptable election is about 21 percent with no monitoring, but 43 percent with monitoring. In both cases monitored elections are then about twice as likely to be acceptable. Furthermore, the figure shows that most of the change is not due to change in the ambiguous category; that is, the improvement in the predicted probability mostly represents movement from the unacceptable to the acceptable category.
Figure 7.5: Predicted overall assessment of election quality
Value scenario for predicted probabilities
Variable |
Value |
Quality of last election |
Unacceptable |
Democracy-related sanctions |
No |
First multiparty election |
No |
Post-coup election |
No |
Foreign aid |
Mean |
Corruption |
Mean |
Year |
Center value |
Source: Based on Model 1 in Table D.1.
Figure 7.6: Predicted overall probability of turnover
Value scenario for predicted probabilities
Variable |
Value |
Turnover in last election |
No |
Democracy-related sanctions |
No |
First multiparty election |
No |
Post-coup election |
No |
Foreign aid |
Mean |
Corruption |
Mean |
Year |
Center value |
Source: Based on Model 3 in Table D.1.
The predicted probabilities for the “Problems” variable are more challenging to discuss, because the variable is unevenly distributed across categories. However, the take away message based on further investigation is that the presence of monitors is mostly associated with improvements for elections that are in the “middle” range of the level of problems. Elections that would be really terrible without monitors are likely to remain so with monitors, just as elections that would be great without monitors will remain so with monitors. It is those elections in the middle where monitors can make a difference. Table D.2 in Appendix D displays predicted variables for a country whose last election was unacceptable and had moderate problems.
The dichotomous turnover variable is more straightforward to interpret. The predictions in Figure 7.6 are once again based on the scenario described in the table below the figure and are for the most recalcitrant cases: elections in countries where the incumbent kept power in the last election. For these elections, in countries that Freedom House rated “partly free” the year before the election, the predicted probability of turnover is only 13 percent when monitors are not present, but it is 36 percent when they are present. Thus the likelihood of turnover nearly triples when monitors are present. In countries that Freedom House rated not free the year before the election the predicted probability of turnover is only 2 percent without monitors, but 10 percent with them present.
These figures are of course not meant as predictions about future outcomes. They are based on historical data and they simply help put some perspective on the magnitude of the relationship in the data by illustrating it for the specified scenario. They indicate that the relationship between monitors and election quality is not only statistically significant, but that it has meaningful substantial size. That is, international election monitors are associated with sizable improvements in election quality and increases in turnover.
The above models assume that all monitoring is equal. However, as discussed earlier, high-quality monitoring should be more likely to improve election quality than low-quality monitoring. To test this proposition, high- and low-quality monitoring were coded as two separate treatment levels and compared to the control of no monitoring after matching. The coding of quality monitoring follows the discussion earlier in the chapter.
The analysis confirms the findings of the earlier descriptive analysis that the quality of monitors matters. Compared with no monitoring, high-quality monitoring is associated with improved election quality and turnover (see Table D.3 in Appendix D). High-quality monitoring is always statistically significant, whereas low-quality monitoring never is. The lack of significance could be due to the smaller sample size for some of the models, but it is also worth noting that the coefficients on high-quality monitors are always larger than the coefficients on low-quality monitoring, suggesting a stronger relationship between high-quality monitors and election quality than between low-quality monitors and election quality.13
Again, Appendix D provides several robustness checks that vary the democracy variable used to delimit the sample and used for matching. The findings are very similar to those discussed above.
Figure 7.7: Direction of changes in democracy scores in monitored elections
Changes in democracy scores are based on the Polity IV democracy scale ranging from −10 to 10.
Is it possible that the findings are driven purely by the post–Cold War emergence of regimes whose leaders sought democracy legitimacy though monitoring? Chapter 2 argued that there was a surge of monitored elections in regimes seeking to demonstrate their honest transition to democracy at the end of the Cold War. Figure 2.4 showed that monitored elections around that time demonstrated strong average gains in their democracy scores, but that this effect declined by the mid-1990s. So is this surge of honest elections driving the findings above? Is election monitoring actually facing declining returns over time?
Closer examination shows that this does not appear to be happening. It has not become less common for election monitoring to be associated with progress. However, the mix of states that invite monitors has changed. More and more countries, with both better and worse regimes, invite monitors. This is illustrated by the rise in cases with no improvements or deterioration in their democracy score as shown in Figure 7.7.
The dishonest states that have joined the monitoring regime on false pretenses are having more of their cheating exposed, as illustrated by the line that shows the number of cases where the polity score deteriorates, and both they and the stable cases are therefore dragging down the average democracy gain as was seen in Figure 2.4. But if we look at the improvements, that is, the elections where the polity score was greater in the current election than in the previous election, then we see that the absolute number of states experiencing gains is actually continuing. The peak in 1992 is not that much of an outlier given the relatively small numbers, and therefore cannot drive the findings above. Thus it is not simply the case that monitors were effective at the end of the Cold War. They continue to be associated with progress. Only in the later years, the mix of monitored countries has changed. The cases of gains thus make up a smaller share of all monitored elections, but they persist.
Are monitored elections better? Yes: Both the descriptive and the statistical analyses show that in multiparty elections in countries that are less than free by Freedom House standards the presence of monitors is associated with improved election quality and more frequent turnover. This is remarkable given that monitors have relatively few resources and tend to go to problematic elections where politicians have strong incentives to do everything they can to hold on to power.
The finding is fairly robust. The statistical analysis addressed the selection problem by using genetic matching techniques to generate a sample where the monitored and nonmonitored elections resemble each other on the most important variables that influence both monitoring and election quality.
Still, as noted earlier, even such careful matching cannot completely eliminate the problems of inference. Most important, as with all other statistical analysis, omitted variable bias remains an issue. For example, no variable measured whether a country has recently undertaken serious electoral reforms or whether the incumbent party had recently undergone an important leadership change. Both of these factors could improve elections, and perhaps monitoring organizations are able to gauge these as an election approaches and simply place themselves at the right elections to receive some credit for the improvements. Or maybe they attend such elections to help assure that the reforms are implemented correctly or to help facilitate a possible leadership change, which of course itself could be a way that monitors then end up making a difference. Yet, it is worthwhile to recall that monitoring organizations often need to make decisions about monitoring well in advance of the actual election dates. Due to logistics and funding such decisions are often made many months in advance, when it is harder for monitors to know whether an election is likely to present progress. And, as Chapter 2 noted, many of the factors that influence the decision to monitor make it hard for monitoring organizations to be opportunistic. For example, there may be donor pressure to monitor foreign aid recipients, or institutional precedent for monitoring new member states or for returning to previously monitored countries.
All in all, the evidence in this chapter supports the argument that the presence of international monitors is associated with improved election quality and increased turnover in countries in the middle of the democracy range—those countries that are no longer staunch single-party autocracies but are on the way to democracy. The positive relationship between monitoring and election quality does not prove causation, but it provides quite substantial evidence that a monitored election is more likely to be acceptable and to produce a turnover of power and resources. This positive finding suggests that, even if the previous chapters have criticized several weaknesses in monitoring operations, international election monitoring may have significant merit in improving the quality of elections and turnover. This is clearly important to the overall discussion about what the international community at large should do about international election monitoring—a discussion the conclusion revisits.
This chapter has also supported some of the propositions in the previous chapter about conditions that might modify the influence of monitors. Importantly, high-quality monitors do appear to have greater influence than low-quality monitors. As Figure 7.3 also showed, the monitor-related gains in quality or turnover may be slightly greater for first multiparty elections than for other elections. However, the limited number of observations and the observational nature of the data make it hard to use the data to pinpoint more precise conditions for when international election monitoring is effective and when it is a waste of time. Nor can the analysis of individual elections say anything about the lasting effects of international monitoring on elections. The following chapter seeks to address these weaknesses through studies of a series of elections in several countries over time.