Appendix: A Technical Report on StrengthsFinder
“What research underpins the StrengthsFinder Profile, and what research is planned to refine the instrument?”
By Theodore L. Hayes, Ph.D., Senior Research Director, The Gallup Organization
There are many technical issues that must be considered when evaluating an instrument such as StrengthsFinder. One set of issues revolves around information technology and the expanding possibilities that Web-based applications offer for those who study human nature. Another set of issues involves what is known as psychometrics, which is the scientific study of human behavior through measurement. There are many American and international standards for psychometrics applied to test development that StrengthsFinder is required to meet (such as AERA/APA/NCME, 1999). The present report deals with some questions that emerge from those standards as well as technical questions that a leader may have about StrengthsFinder’s use in his or her organization.
A few technical references have been cited for readers who wish to review primary source material. These technical materials may be found in local university libraries or on the Internet. The reader is encouraged to contact Gallup for further discussion or review the sources cited at the end of the report.
WHAT IS STRENGTHSFINDER?
StrengthsFinder is a Web-based assessment of normal personality from the perspective of positive psychology. It is the first assessment instrument developed expressly for the Internet. There are 180 items in StrengthsFinder, presented to the user over a secure connection. Each item lists a pair of potential self-descriptors, such as “I read instructions carefully” and “I like to jump right into things.” The descriptors are placed as if anchoring polar ends of a continuum. The participant is then asked to choose which statement in the pair best describes him or her, and also to what extent that chosen option is descriptive. The participant is given twenty seconds to respond to a given item before the system moves on to the next item. (StrengthsFinder developmental research showed that the twenty-second limit resulted in a negligible item noncompletion rate.) The item pairs are grouped into thirty-four themes.
WHAT PERSONALITY THEORY IS STRENGTHSFINDER BASED ON?
StrengthsFinder is based on a general model of positive psychology. It captures personal motivation (Striving), interpersonal skills (Relating), self-presentation (Impacting), and learning style (Thinking).
WHAT IS POSITIVE PSYCHOLOGY?
Positive psychology is a framework, or a paradigm, that encompasses an approach to psychology from the perspective of healthy, successful life functioning. Topics include optimism, positive emotions, spirituality, happiness, satisfaction, personal development, and well-being. These topics (and similar ones) may be studied at the individual level or in a work group, family, or community. While some who study positive psychology are therapists, a more typical distinction is that therapists focus on removing dysfunction, while positive psychologists focus on maintaining or enhancing successful function. A recent special issue of the journal American Psychologist (2000) gave an overview of positive psychology by some of its most distinguished academic researchers.
IS STRENGTHSFINDER SUPPOSED TO BE A WORK-RELATED INVENTORY, A CLINICAL INVENTORY, BOTH, OR NEITHER?
StrengthsFinder is an omnibus assessment based on positive psychology. Its main application has been in the work domain, but it has been used for understanding individuals in a variety of settings — families, executive teams, and personal development. It is not intended for clinical assessment or diagnosis of psychiatric disorders.
WHY ISN’T STRENGTHSFINDER BASED ON THE “BIG FIVE” FACTORS OF PERSONALITY THAT HAVE BEEN WELL-ESTABLISHED IN RESEARCH JOURNALS FOR OVER TWENTY YEARS?
The “big five” factors of personality are neuroticism (which reflects emotional stability), extroversion (seeking the company of others), openness (interest in new experiences, ideas, and so forth), agreeableness (likability, harmoniousness), and conscientiousness (rule abidance, discipline, integrity). A substantial amount of scientific research has demonstrated that human personality functioning can be summarized in terms of these five dimensions. This research has been conducted across cultures and languages (for example, McCrae and Costa, 1987; McCrae, Costa, Lima, et al., 1999; McCrae, Costa, Ostendorf, et al., 2000).
The major reason that StrengthsFinder is not based on the big five is that the big five is a measurement model rather than a conceptual one. It was derived from factor analysis. No theory underpinned it. It consists of the most generally agreed upon minimal number of personality factors, but conceptually it is no more correct than a model with four or six factors (Block, 1995; Hogan, Hogan, and Roberts, 1996). StrengthsFinder could be boiled down to the big five but nothing would be gained from doing so. In fact, reducing the respondent’s StrengthsFinder score to five dimensions would produce less information than is produced by any current measure of the big five since those measures also report subscores in addition to the five major dimensions.
WHY DOES STRENGTHSFINDER USE THESE 180 ITEM PAIRS AND NOT OTHERS?
These pairs reflect Gallup’s research over three decades of studying successful people in a systematic, structured manner. They were derived from a quantitative review of item functioning, from a content review of the representativeness of themes and items within themes, with an eye toward the construct validity of the entire assessment. Given the breadth of human performance we wish to assess, the pool of items is large and diverse. Well-known personality assessments range from 150 to upward of 400 items.
ARE THE STRENGTHSFINDER ITEMS IPSATIVELY SCORED, AND IF SO, DOES THIS LIMIT SCORING OF THE ITEMS?
Ipsativity is a mathematical term that refers to an aspect of a data matrix, such as a set of scores. A data matrix is said to be ipsative when the sum of the scores for each respondent is a constant. More generally, ipsativity refers to a set of scores that define a person in particular but is comparable between persons only in a very limited way. For example, if you rank-ordered your favorite colors and someone else rank-ordered their favorite colors, one could not compare the intensity of preference for any particular color due to ipsativity; only the ranking could be compared. Out of 180 StrengthsFinder items, less than 30 percent are ipsatively scored. These items are distributed over the range of StrengthsFinder themes, and no one theme contains more than one item scored in a way that would produce an ipsative data matrix (Plake, 1999).
HOW ARE THEME SCORES CALCULATED ON STRENGTHSFINDER?
Scores are calculated based on the mean of the intensity of self-description. The respondent is given three response options for each self-description: strongly agree, agree, and neutral. A proprietary formula assigns a value to each response category. Values for items in the theme are averaged to derive a theme score. Scores can be reported as a mean, as a standard score, or as a percentile.
WAS MODERN TEST SCORE THEORY (FOR EXAMPLE, IRT) USED TO DEVELOP STRENGTHSFINDER?
StrengthsFinder was developed to capitalize on the accumulated knowledge and experience of Gallup’s talent-based strengths practice. Thus, initially items were chosen based on traditional validity evidence (construct, content, criterion). This is a universally accepted method for developing assessments. Methods to apply IRT to assessments that are both heterogeneous and homogeneous are only now being explored (for example, Waller, Thompson, and Wenk, 2000). Further iterations of StrengthsFinder may well use IRT methods to refine the instrument.
WHAT CONSTRUCT VALIDITY WORK LINKS STRENGTHSFINDER TO MEASURES OF NORMAL PERSONALITY, ABNORMAL PERSONALITY, VOCATIONAL INTEREST, AND INTELLIGENCE?
StrengthsFinder is an omnibus assessment of interpersonal talents based on positive psychology. Therefore, it will undoubtedly have correlational linkages to these measures to about the same extent that personality measures link to other measures in general. Ultimately, this is an empirical question to be explored in future research.
CAN STRENGTHSFINDER SCORES CHANGE?
This is an important question for which there are both technical and conceptual answers.
Technical answers: The talents measured by StrengthsFinder are expected to demonstrate a property called reliability. Reliability has several definitions. One definition of reliability, technically known as internal consistency, is the proportion of the score that is due to the aspects of the theme itself and not to irrelevant influences such as mood, fatigue, and so forth. High internal consistency shows that a theme’s items provide a consistent read with each other and do not reflect other influences. Gallup researchers recently investigated the internal reliability of StrengthsFinder themes using data from more than fifty thousand respondents. Because the number of items per StrengthsFinder theme vary — there are between four and fifteen items per theme — the average inter-item correlation for each theme was adjusted to reflect the internal consistency for a fifteen-item theme. This analysis showed that the average internal consistency was .785. The maximum possible internal consistency is 1, and a rule of thumb target for reliability is .80. Thus, StrengthsFinder themes show acceptable internal consistency.
A second definition of reliability, technically known as test-retest, is the extent to which scores are stable over time. Almost all StrengthsFinder themes have a test-retest reliability over a six-month interval between .60 and .80; a maximum test-retest reliability score of 1 would indicate that all StrengthsFinder respondents received exactly the same score over two assessments.
Conceptual answers: While an evaluation of the full extent of this stability is, of course, an empirical question, the conceptual origins of a person’s talents are also relevant. Gallup has studied the life themes of high performers in a large series of research studies combining qualitative and quantitative investigations over many years. Participants have included youths in their early teens to adults in their mid-seventies. In each of these studies the focal point was the identification of long-standing patterns of thought, feeling, and behavior associated with success. The lines of interview questioning used were both prospective and retrospective, such as “What do you want to be doing ten years from now?” and “At what age did you make your first sale?” In other words, the time frame of interest in our original studies of excellence in job performance was long term, not short term. Many of the items developed provided useful predictions of job stability, thereby suggesting that the measured attributes were of a persistent nature. Tracking studies of job performance over two- to three-year time spans added to the Gallup understanding of what it takes for a job incumbent to be consistently effective, rather than just achieving impressive short-term gains. The prominence of dimensions and items relating to motivation and to values in much of the original life themes research also informed the design of a StrengthsFinder instrument that can identify those enduring human qualities.
At this early stage in the application of StrengthsFinder, it is not yet clear how long an individual’s salient features, so measured, will endure. In general, however, it is likely to be years rather than months. We may perhaps project a minimum of five years and upper ranges of thirty to forty years and longer. There is growing evidence (for example, Judge, Higgins, Thoresen, and Barrick, 1999) that some aspects of personality are predictive throughout many decades of the life span. Some StrengthsFinder themes may turn out to be more enduring than others. Cross-sectional studies of different age groups will provide the earliest insights into possible age-related changes in normative patterns of behaviors. The first explanations for apparent changes in themes, as measured, should therefore be sought in the direction of measurement error rather than as indications of a true change in the underlying trait, emotion, or cognition. The respondents themselves should also be invited to offer an explanation for any apparent discrepancies.
DO STRENGTHSFINDER THEME SCORES VARY ACCORDING TO RACE, SEX, OR AGE?
Gallup has studied StrengthsFinder themes in the general population. These studies aim to reflect all possible respondents in general, not applicants for or incumbents in a particular position. Score differences between major demographic groups tend to average under .04 points (i.e., four hundredths of a point) at this worldwide theme database level.
Practically speaking, these score differences are trivial. There is also no consistent pattern to the score differences. For example, one of the most important sales-related themes might be Achiever. For Achiever, males score higher than females by .031 points; nonwhite (minority group) individuals score higher than white (majority group) individuals by .048 points; and people under forty years of age score higher than those forty and over by .033 points. An important theme for managers might be Arranger. For this theme females score higher than males by .021 points; white (majority group) individuals score higher than nonwhite (minority group) individuals by .016 points; and people under forty years of age score lower than those forty and over by .053 points. Finally, many people believe that Empathy is an important theme for teaching, in particular, and human relations, in general. For this theme females score higher than males by .248 points; white (majority group) individuals score higher than nonwhite (minority group) individuals by .030 points; and people under forty score higher than those forty and over by .014 points.
Statistically speaking, with more than fifty thousand respondents in the current StrengthsFinder database, even some of these very small score differences may be deemed “statistically significant.” This is simply a function of sample size. It is critical to note that the average effect size difference, expressed in units referred to as “d-prime,” between men and women over all themes is .099 (that is, the average correlation between theme difference and group membership is under .05); the average d-prime effect size difference between whites and nonwhites is .133 (the average correlation equivalent is under .07); and the average d-prime effect size difference between those under forty years of age and those at least forty is .050 (the average correlation equivalent is under .03). Also, many of these small differences are favorable for what one might consider “protected” groups — nonwhites, women, and those forty or more. Finally, even significant differences do not indicate that one group has a “better” theme score than another, only that at the database level we might expect to see trends in scores for particular groups.
In reviewing these results, four conclusions seem clear to Gallup researchers. First, the average differences between theme scores for protected versus majority groups are very small, typically under .04 points, which translates to a d-prime difference score under .10. Thus, there is no obvious or measurement-level bias in score distributions between these groups. There is 98-100 percent overlap between score distributions for comparable groups.
Second, score differences are extremely small and are only statistically significant in a few cases. This is due to the fact that more than fifty thousand respondents have completed StrengthsFinder, thus overmagnifying almost any score difference. Even when there are significant differences, the protected group is typically favored.
Third, no one theme is better than another. They simply represent the potential for different kinds of strengths. Strength building is not a zero-sum game.
In summary, trivially small differences at the worldwide database level do not translate into important practical differences at the individual level.
HOW CAN STRENGTHSFINDER BE ADMINISTERED, SCORED, AND REPORTED FOR INDIVIDUALS WHO ARE UNABLE TO USE THE INTERNET EITHER BECAUSE OF DISABILITY OR ECONOMIC STATUS?
In regard to economic status (a.k.a. the digital divide), possible solutions include accessing the Internet from a library or school. It should be noted that some organizations that Gallup works with do not have universal Internet access. In these cases, as with those from disadvantaged backgrounds, the solution generally has involved special access from a few central locations.
In regard to disability, a range of accommodations is available. Generally, the most effective is for the participant to turn off the timer that governs the pace of StrengthsFinder administration. Beyond this, accommodations would need to be arranged with Gallup on a case-by-case basis in advance of taking StrengthsFinder.
WHAT IS THE READING LEVEL FOR STRENGTHSFINDER? WHAT ALTERNATIVES ARE AVAILABLE FOR THOSE WHO DO NOT MEET THAT LEVEL?
StrengthsFinder is designed for completion by those with at least an eighth- to tenth-grade reading level (that is, by most fourteen-year-olds). Trials of StrengthsFinder in our youth leadership studies have demonstrated neither significant nor consistent problems in completion of StrengthsFinder among teens. Possible alternatives or accommodations include turning off the timer feature to allow for checking a dictionary or to ask about the meaning of a word.
IS STRENGTHSFINDER APPROPRIATE FOR NON-ENGLISH SPEAKERS?
There is overwhelming evidence from both Gallup and other research organizations that personality dimensions such as those measured by StrengthsFinder are the same across cultures. What changes is the level of the score, not the nature of the theme. StrengthsFinder is currently available in seven languages, and translation into other languages will be completed in 2001. Databases for expected scores by language are under development.
WHAT FEEDBACK DOES A CANDIDATE GET FROM STRENGTHSFINDER?
Feedback varies depending on the reason the person completes the StrengthsFinder Profile. Sometimes the respondent receives only a report listing his or her top five themes — those where the person scored the highest. In other situations the person may also review the remaining twenty-nine themes, along with action suggestions for each theme, in a personal feedback session with a Gallup consultant or in a supervised team-building session with their colleagues.
References
The following references are provided for those readers interested in particular details of this technical report. This reference list is not meant to be exhaustive, and although many use advanced statistical techniques, the reader should not be deterred from reviewing them.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education (AERA/APA/NCME). 1999. Standards for educational and psychological testing. Washington, D.C.: American Educational Research Association.
American Psychologist. Positive psychology [special issue]. 2000. Washington, D.C.: American Psychological Association.
Block, J. 1995. A contrarian view of the five-factor approach to personality description. Psychological Bulletin 117:187-215.
Hogan, R., J. Hogan, and B. W. Roberts. 1996. Personality measurement and employment decisions: Questions and answers. American Psychologist 51:469-77.
Hunter, J. E., and F. L. Schmidt. 1990. Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage.
Judge, T. A., C. A. Higgins, C. J. Thoresen, and M. R. Barrick. 1999. The big five personality traits, general mental ability, and career success across the life span. Personnel Psychology 52:621-52.
Lipsey, M. W., and D. B. Wilson. 1993. The efficacy of psychological, educational, and behavioral treatment. American Psychologist 48:1181-1209.
McCrae, R. R., and P. T. Costa. 1987. Validation of the five-factor model of personality across instruments and observers. Journal of Personality and Social Psychology 52:81-90.
McCrae, R. R., P. T. Costa, M. P. de Lima, et al. 1999. Age differences in personality across the adult life span: Parallels in five cultures. Developmental Psychology 35:466-77.
McCrae, R. R., P. T. Costa, F. Ostendorf, et al. 2000. Nature over nurture: Temperament, personality, and life span development. Journal of Personality and Social Psychology 78:173-86.
Plake, B. 1999. An investigation of ipsativity and multicollinearity properties of the StrengthsFinder Instrument [technical report]. Lincoln, NE: The Gallup Organization.
Waller, N. G., J. S. Thompson, and E. Wenk. 2000. Using IRT to separate measurement bias from true group differences on homogeneous and heterogeneous scales: An illustration with the MMPI. Psychological Methods 5:125-46.