Although the Seattle Longitudinal Study (SLS) was designed to focus on cognitive changes in normal community-dwelling populations, it is inevitable that a prospective study of aging will eventually include in its successive follow-up cycles individuals who are beginning to show cognitive impairment and eventually may develop full-blown symptoms of dementia. The early detection of excess risk for the eventual development of dementia may have significant value in planning the deployment of prophylactic pharmaceutical and behavioral interventions as such techniques are emerging from the laboratory.
In this chapter, we describe our studies involving the neuropsychological assessment of a community-dwelling sample of older adults who have not previously been identified as suffering from cognitive impairment. Results are also given for a series of four 3-year follow-up assessments. We then describe the extension analyses that link the clinical measures with our psychometric battery for the study of normal aging. Finally, we report analyses of studies that obtain postdicted estimates of earlier performance on the neuropsychological measures and speak to the possibility of early detection of risk for cognitive impairment.
Specific measurement systems that are utilized to assess cognitive status, cognitive change across age, and the detection of cognitive deficits differ markedly depending on whether the investigators’ interest is focused on the study of normal aging or on the detection and diagnostic definition of neuropathology.
To study normal aging, it has generally been necessary to construct assessment batteries suitable for measurement across the entire adult life span, hence requiring stimulus material across a wide range of difficulty. Measures typically used for this purpose come from derivatives of L. L. Thurstone’s (1938) work on defining primary mental abilities for the detailed study of normal intelligence (e.g., Ekstrom, French, Harman, & Derman, 1976; Horn, 1982; Schaie, 1985) or from the various forms of the Wechsler-Bellevue scales and derivatives (Kaufman, Kaufman, McLean, & Reynolds, 1991; Matarazzo, 1972).
Measures used by neuropsychologists (with the exception of the Wechsler scales, for which use overlaps both camps), however, are typically designed to have relatively low ceilings and bottoms because they are used to chart deficit from the point in time when it was first noticed to the end point of death or total inability to respond to psychological measures. A neuropsychological battery commonly used in North America for the diagnosis of dementia was developed by the Consortium to Establish a Registry for Alzheimer’s disease (CERAD; Morris et al., 1989, 1993).
In this section, I report findings from a subsample of community-dwelling participants in the SLS to whom we administered the CERAD battery. In the following sections, I then report the projection of this battery developed for the detection of dementia into the normal mental ability factor space. Regression equations are then provided that allow postdiction of indicators of possible risk of dementia by considering study participants’ longitudinal psychometric data at an age when neuropsychological assessment would not have been feasible or productive. The effectiveness of utilizing the longitudinal psychometric data and the longitudinal change on the estimated neuropsychological data are then evaluated against the criterion of dementia ratings made by neuropsychologists (also see Schaie, Caskie, Revell, Willis, Kaszniak, & Teri, 2005).
The subsample consisted of 499 adults (211 men, 288 women) who were part of the SLS seventh wave data collection in 1997–1998 and who ranged in age from 60 to 97 years (M = 73.07; SD = 8.30) at the time of their neuropsychological assessment. For the age/cohort group comparisons, we subdivided the sample into a young-old group (age range 60–69 years, n = 180; 73 males, 107 females; M = 64.23; SD = 3.54), an old-old group (age range 70–79 years, n = 205; 90 males, 115 females; M = 74.61; SD = 2.85), and a very old group (age range 80–95 years, n = 114; 48 males, 66 females; M = 84.26; SD = 3.76). Educational levels of the sample ranged from 7 to 20 years (M = 15.04; SD = 2.77). Participants were included in the neuropsychology studies only if they had been tested on the Primary Mental Abilities (PMA) battery on at least one previous occasion (7 years earlier).
The variables included in the following analysis were the 20 subtests and six factor scores of the SLS cognitive battery, the CERAD battery, selected tests from the Wechsler Adult Intelligence Scale–Revised (WAIS-R) and the Wechsler Memory Scale–Revised (WMS-R), and some other commonly used neuropsychological assessment instruments (see chapter 3 for descriptions of the individual measures).
Because the SLS studies community-dwelling samples, no psychiatric examinations or clinical dementia ratings were available. Instead, we relied on a research protocol involving a two-step procedure for rating of neuropsychological functional status. First, participants were evaluated against a screening algorithm to determine whether there were characteristics that might result in a rating of cognitive impairment in a neuropsychological case conference. The screening algorithm utilized cutoff scores based on previous research on the association of cutoff criteria and cognitive dysfunction (Crum, Anthony, Bassett, & Folstein, 1993; LaRue, 1992; Spreen & Strauss, 1991). The cutoff criteria for the selected tests were as follows:
1. Mini-Mental State Examination (MMSE) score below 27
2. Mattis Dementia Rating Scale score below 130
3. Trail B score time longer than 180 seconds
4. An age-adjusted scaled score less than 7 for any of the following: WAIS-R Vocabulary, WAIS-R Comprehension, WAIS-R Block Design, and WAIS-R Digit Symbol
As a second step, those records that met the algorithm’s screening criteria were then examined in greater detail by two neuropsychology consultants. In the consensus conferences, scores on the neuropsychological tests and tester’s report of observed sensory limitations and current or previous health problems were considered. Participants received one of the following ratings:
1. The participant is normal.
2. The participant is not demented at this time, but has one or more characteristics that suggest further monitoring is indicated.
3. The participant is probably demented.
4. The participant is definitely demented.
The neuropsychological ratings identified 354 participants (70.9%) as normal, 111 participants (22.2%) who required monitoring, 22 participants (4.4%) who were probably demented, and 12 (2.4%) who were definitely demented. There were no significant gender differences in the proportions of individuals assigned to the different rating classifications. As expected, there were significant age differences between rating groups. The group requiring monitoring was approximately 4 years older than the normal group, and those in the demented categories were 8 years older than those in the normal group. There were no educational differences between the normal and demented groups, but the “monitor” group had an approximately 1 year less education on average than both the normal and demented groups. Mean Center for Epidemiologic Studies–Depression Scale (CES-D) scores for the four groups were 7.26, 8.92, 11.41, and 12.02, respectively, in order of declining function. Reported mean instrumental activities of daily living complaints were 0.83, 1.01, 2.05, and 3.00, respectively, in order of declining function
The data analysis plan involved, first, the confirmation of the factor structure for the PMA measures. Second, an extension analysis was conducted to determine the relation of the neuropsychology measures to the primary mental abilities. Third, the regressions of the primary mental abilities on the neuropsychology measures were used to estimate neuropsychology measures for prior SLS occasions. Fourth, change scores for the primary mental abilities and the estimated neuropsychology scores were computed from 1984 to 1991 and from 1991 to 1994, and participants were classified as to whether they had experienced reliable decline or not.
For ease of comparisons, all raw data were transformed to T-scores with a mean of 50 and a standard deviation of 10. Four neuropsychological variables that had skewness greater than 2.00 were normalized using a McCall transformation (Garrett, 1966). The normalized variables were Fuld Retrieval, the MMSE, Mattis grand total, and Trails A. Also, values above 300 seconds on Trails B were trimmed to a value of 300 before T-score transformation.
An analysis of variance of the neuropsychology measures determined that there were statistically significant overall differences for gender, age group, and the Age × Gender interaction (p < .001). Univariate follow-up analyses showed significant age group difference for all 17 variables. As expected, higher scores were consistently observed for the young-old as compared with the two older groups and for the old-old compared with the very old. Gender differences were significant at or beyond the 1% level of confidence for the Fuld Retrieval test, the Fuld Rapid Verbal Retrieval test, the Word List Recall, and the Mattis total score; all differences favored women. Age × Gender interactions were significant for the WAIS-R Vocabulary test and the Mattis total score. For both variables, there were significant age group differences for the men, but not for the women.
Descriptive data for the CERAD variables included in this study are provided in table 20.1. Because information on a community-dwelling sample on this extensive database may be of broader interest, means and standard deviations are reported in raw score form by gender and age/cohort group as well as for the total sample. The intercorrelations among the 17 primary mental ability measures and the 17 neuropsychology measures are provided in the appendix (table A-20.1).
We have completed a 3-year follow-up of the original administration of the neuropsychological test battery and provide here a preliminary report on the retest stability of the battery as well as information on average change over 3 years.
The 3-year follow-up sample consisted of 286 adults (114 men, 172 women) who were part of the SLS seventh wave data collection in 1997–1998 and who ranged in age from 60 to 98 years (M = 74.30; SD = 7.94) at the time of their neuro-psychological assessment. For the age/cohort group comparisons, we subdivided the sample into a young-old group (age range 60–69 years, n = 86; 34 males, 52 females; M = 64.84; SD = 2.51), an old-old group (age range 70–79 years, n = 120; 49 males, 71 females; M = 74.63; SD = 2.89), and a very old group (age range 80–98 years, n = 80; 31 males, 49 females; M = 83.96; SD = 3.98). Educational level of the sample ranged from 8 to 20 years (M = 15.21; SD = 2.75).
All follow-up participants were assessed within 1 month of the third anniversary of their original test administration. Testing was conducted in the participants’ homes in the same manner as for the original test administration.
The 3-year stability of the neuropsychology battery ranged from modest to quite satisfactory, although it was generally lower than the retest stabilities reported for the SLS cognitive abilities battery. Stability coefficients ranged from a low of .481 for the Mattis total score to a high of .883 for the WAIS Digit Symbol test. Stability coefficients for all 17 measures are provided in table 17.3.
A 2 (gender) × 3 (age/cohort level) multivariate analysis of variance was conducted for the follow-up data set. Significant main effects were found for age group, gender, and test occasions as well as for the Age Group × Occasion interaction (Rao’s R < .001). However, the Age Group × Gender, Occasion × Gender, and the triple interaction were not statistically significant. It is therefore the Age Group × Occasion interaction that is of particular interest because it reflects age/ cohort differences in change over time. Table 20.3 provides information from the univariate follow-up tests to indicate average changes in raw score units over the 3-year interval. The univariate effects indicated significant change over time for all of the WAIS subtests, Trails B, the Mattis, and Verbal Recall. Differential change by age level was significant at or below the 5% level of confidence for all neuropsychology measures except for the Mattis, WAIS Block Design, Verbal Fluency, and Word Recall measures. Table 20.2 provides the raw score differences for the total sample and by age group.
In summary, the neuropsychology battery had moderate-to-good stability over 3 years. Significant age changes were in a positive direction for the young-old. For the old-old, there were still some positive changes (Word List Recall, WAIS Vocabulary, WAIS Comprehension, Mattis total), but significant negative changes were found for the WAIS Digit Symbol test. Even the very old gained on the WAIS Vocabulary and Comprehension test, but declined significantly on Boston Naming, WAIS Digit Symbol, Trails B, Fuld Retrieval, and Fuld Verbal Retrieval tests.
An important application of confirmatory factor analysis, as described in chapter 2, is the method of extension analysis (Dwyer, 1937; Tucker, 1971). To conduct an optimal extension analysis, it is necessary to have a sample for whom data are concurrently available both on a set of measures with dimensionality (i.e., latent constructs) that has been well established and on the other measures with relation to these constructs that are to be studied. For our purposes, we began with the psychometric abilities battery that has been employed in the SLS since 1983. We then added the CERAD as well as other neuropsychological measures we wished to relate to the psychometric ability dimensions.
The fit of the six-factor structure for the 20 primary mental ability tests employed in the SLS (Schaie, Dutta, & Willis, 1991) was assessed for the subsample used in the following analyses. All factor models were estimated using the full information maximum likelihood procedure implemented in Amos 4.0 (Arbuckle & Wothke, 1999). This procedure estimates the model parameters from the raw data matrix rather than from a covariance or correlation matrix.
It was necessary to remove the Perceptual Speed factor and related observed measures from these analyses because of the speeded nature of all other tests included, to avoid pulling off excessive individual difference variances on the speed factor. The PMA battery minus the three perceptual speed tests was therefore recomputed for the remaining 17 variables and five factors based on the sample used in the present study. The fit for the reduced five-factor solution was χ2 (df = 108, N = 499) = 536.08, p < .001, comparative fit index (CFI) = .99, root-mean-square error of approximation (RMSEA) = .09, TLI = .98. Standardized factor loadings were significant for all salient values reported by Schaie et al. (1991).
In the extension analysis, factor loadings were constrained to the unstandardized values from the confirmatory factor analysis solution for the cognitive variables for this sample. Factor loadings for the neuropsychological measures were then freely estimated, providing information on the projection of these measures into the previously established five-factor cognitive factor structure. Because multiple scores from several of the neuropsychology tests were used, three residual covariances were estimated: Trails A with Trails B, Fuld Retrieval with Fuld Rapid Verbal Retrieval, and WMS-R Immediate with WMS-R Delayed. Factor variances for the five latent cognitive factors were fixed to unity. Error variances for the 34 observed variables were freely estimated.
The neuropsychological assessment measures, when extended into the psychometric abilities factor structure, as might be expected, generally spread over two or more of the psychometric ability domains (see table 20.3). All measures, except for the WAIS-R Digit Span, Vocabulary, Comprehension, and Block Design scales, had significant loadings on the Verbal Memory factor. Of the last scales, Digit Span, Vocabulary, and Comprehension had their largest extensions into the Verbal Comprehension factor; Block Design extended most prominently into the Spatial Ability factor.
Most measures also had a secondary loading on the Spatial Ability factor, except for the Wechsler Memory Immediate Recall, the WAIS-R Digit Span scale, and the MMSE. Several measures also had secondary and/or tertiary loadings on the Inductive Reasoning and Numeric Ability factors. The negative loadings found for Trails were expected because, for that measure, a large score (time to completion) is in the unfavorable direction.
To determine risk for the occurrence of dementia in old age prior to the occurrence of clinically diagnosable symptoms, it is necessary to find a way of estimating what the study participants’ scores would have been at earlier ages if these tests could have been administered. This requires an exercise in postdiction.
We first estimated T-scores on the neuropsychology measures from the PMA factor scores for the concurrent occasion using factor weights obtained by orthonormal transformation of the values in table 20.3. We then obtained information on the relation between the estimated and the actually observed T-scores. Table 20.4 reports the correlations between the observed and estimated neuropsychology test scores as well as the multiple correlations between the concurrent PMA tests and the neuropsychology tests, both with and without including age and education as predictors. As can be seen, the values from the extension analyses are somewhat more conservative because they attenuate for error of measurement.
We conclude that we can validly estimate scores on the neuropsychology tests from scores on the five PMA factors on the basis of the following considerations: First, all correlations between estimated and observed neuropsychology scores are significant at the .001 confidence level. Second, the correlations between the observed and estimated correlations for the neuropsychology measures approach the reliable variance of the tests (see chapter 3). Third, the correlations between observed and estimated scores are also within the first decimal for alternate ordinal least squares regression estimates for most measures. However, the extension analysis–derived estimates are preferred because they adjust for error of measurement (Tucker, 1971). Hence, it seemed reasonable to attempt backward prediction (postdiction) to estimate what our participants’ earlier scores on the neuropsychology battery might have been if we had had the opportunity to measure them 7 and 14 years earlier.
We next used the factor weights from the extension analyses to estimate (post-dict) T-scores for the neuropsychology tests for our data collections that occurred 7 (1991) and 14 years (1984) prior to the direct measurement on the neuropsychology tests.
Mean values by age group (young-old, old-old, very old) are provided in table 20.5 for the three estimated data points. Age declines significant at the .01 level of confidence are observed in the young-old group over a 14-year interval (1984–1998) for all measures except the Mattis. However, significant decline on the WAIS-R Vocabulary scale is observed only over the second 7-year interval, from 1991 to 1998.
In the old-old group, significant change over 7 years from 1984 to 1991 is found for WAIS-R Digit Symbol, Praxis Delayed Total, Trails A, and Trails B. Significant 14-year changes (1984–1998) occur for all measures accept the Mattis. These can be attributed primarily to the many significant declines occurring during the period from 1991 to 1998. Finally, in the very old group, significant 7-year changes from 1984 to 1991 are found for the Boston Naming Test, WAIS-R Block Design, and Digit Symbol scales as well as Praxis Delayed Total, Trails A, and Trails B. Significant 14-year changes are found for all measures.
The next step in this analysis was to examine the effectiveness of utilizing longitudinal change data on the primary mental abilities and estimated neuropsychology data in predicting the ratings made by our neuropsychologists.
We first examine change over the most proximal 7 years from 1991 to 1998. Then we reach back another 7 years and examine changes occurring from 1984 to 1991. Longitudinal change is considered both for the PMA factor scores (computed from the actual observations) and for the estimated neuropsychology measures. In each instance, we first contrast all participants rated as having some suspicious characteristics against the normal participants (Rating 1 vs. combined Ratings 2, 3, and 4). We then contrast only those individuals who were identified as probably or definitely demented against the normal group (Ratings 1 vs. Ratings 3 and 4).
In tables 20.6 through 20.9, we consequently distinguish between normal (Rating 1), suspect (Ratings 2, 3, and 4), and demented (Ratings 3 and 4) individuals. Because we found no significant sex-by-rating-category or age-group-by-rating-category interactions, data are reported only for the total sample. In each case, mean longitudinal change is reported in T-score points. Perhaps of greater practical interest, however, are the proportions of individuals who show reliable decline (defined as a drop that is ± 1 SE from T1) and the odds ratios between the normal and diagnosed groups.
Table 20.6 shows average declines in T-score points, proportions of the rating groups who declined significantly over 7 years, and the odds ratios of these proportions contrasting the normal and rating groups. Because all of the participants in this analysis were over 60 years of age (mean age 73 years at the time they were rated), it is not surprising that significant average age changes were observed on all of the factor scores. There is a significant interaction between magnitude of 7-year change and rating group for all factor scores except Inductive Reasoning. As expected, greater change is observed for the groups rated as other than normal. When all individuals with some suspicious characteristics are contrasted with normal individuals, significant odds ratios are obtained only for the Verbal Comprehension and the Verbal Recall factors. However, when only those rated as probably or definitely demented are contrasted with the normal individuals, significant odds ratios are found for all estimated neuropsychology measures.
Data for the estimated neuropsychology scores may be found in table 20.7. Again, significant interactions are found between magnitude of 7-year change and rating groups, with greater change for both suspect and demented categories. Odds ratios are statistically significant for the suspect group for all neuropsychology scores except the Boston Naming Test and for Word List Recall. All odds ratios are significant for the demented group. It is noteworthy that the odds ratios for the estimated neuropsychology measures are substantially larger than those for the psychometric factor scores.
Having established that we can provide meaningful estimates over the most proximal 7 years (1991–1998) prior to the actual neuropsychological assessment of our study participants, we then reached further back to determine the effectiveness of this procedure in identifying individuals at risk at an earlier point in time by studying the predictive effectiveness of change over the preceding 7-year period.
Table 20.8 shows data on change in the PMA factor scores from 1984 to 1991 (the end point is now 7 years prior to the actual administration of the neuropsychology tests). Participants at T1 = 1984 in this analysis were in their late 50s. Hence, decline over 7 years was not significant for any PMA factor for the normal and suspect groups. However, significant odds ratios were found for the demented group (p < .05) for Numeric Facility and Verbal Comprehension.
Results for the estimated neuropsychology scores are given in table 20.9. Significant interactions between the magnitude of 7-year decline and rating group were found for all measures except the Boston Naming Test, WAIS-R Digit Span, WAIS-R Block Design, Praxis Delayed total, and part A of the Trail-Making Test. Significant odds ratios when contrasting the suspect with the normal group were found for the estimated scores of the Fuld Retrieval, the Delayed Wechsler Memory, the MMSE, and the Word List Recall. Significant odds ratios contrasting the demented with the normal group were obtained for all measures except Boston Naming, WAIS-R Digit Span, WAIS-R Block Design, Praxis Delayed total, and part A of the Trail-Making Test.
Our findings suggest that it is possible to obtain useful estimates of what individuals’ status on neuropsychological measures might have been at earlier points in time had we been able to measure them instantly. Findings indicate that, for this community-dwelling sample, age-related declines would have been found over 14 years in all age groups (except for the Mattis) and for a few neuropsychological measures over 7 years in the old-old and very old age groups.
A major criterion for the utility of the analyses presented here is, of course, whether the backward estimation of neuropsychological measures can contribute to the detection of potential risk of dementia at an earlier point in time when the direct identification by a neuropsychological battery would not be practical because of expected ceiling effects. The criterion used by us for this purpose was the well-established procedure of consensus ratings by neuropsychologists.
Our results suggest first that significant individual change on PMA test performance over the 7 years preceding neuropsychological evaluation has predictive value for identifying individuals who will be rated by neuropsychologists as mentally impaired. More important, although there is some predictability directly from the psychometric test battery, there is better prediction from the estimated neuropsychology test scores. Further, we can also successfully predict current diagnostic status from change in the estimated neuropsychology measures from 14 years to 7 years prior to the actual administration of the neuropsychology battery.
Another interesting advantage of the approach taken here is that, in contrast to the actual neuropsychology tests, the estimated scores have no ceiling because they are scaled from the midpoint of the total population. Removing the ceiling limitation for the estimated neuropsychology tests may well be a major reason for the greater outcome efficacy of the estimated neuropsychology scores over the direct measures of change in the primary mental abilities.
Efforts to develop programs for the prevention or arrest of dementia at early stages will depend heavily on the early identification of those at risk before clinical symptoms begin to appear. This chapter presents a novel approach that takes advantage of existing longitudinal data to identify individuals at risk by postdicting performance on neuropsychological tests 7 and 14 years prior to neuropsychological assessment.
This chapter described studies that involve the neuropsychological assessment of a community-dwelling sample of older adults who have not previously been identified as suffering from cognitive impairment. Higher performance levels were found for the young-old as compared with the old-old and very old groups. Significant gender differences in favor of women were found for the Fuld Retrieval test, the Fuld Rapid Verbal Retrieval test, the Word List Recall, and the Mattis total score. Preliminary results are also given for a 3-year follow-up assessment. Moderate-to-good stabilities were obtained for the neuropsychology battery. The observed 3-year change was positive for the young-old and partially positive for the old-old, but significant decrements were found for the very old subgroup.
Extension analyses were then described that link the clinical measures with our psychometric battery for the study of normal aging. Most measures had significant loadings on the Verbal Memory factor. The exceptions were the WAIS-R Digit Span, Vocabulary, and Comprehension, with primary loadings on the Verbal Comprehension factor; and Block Design, which loaded most prominently on the Spatial Ability factor. Several measures also had secondary and/or tertiary loadings on Spatial Ability, Inductive Reasoning, and Numeric Ability.
Finally, analyses were reported of studies that use the results of the extension analyses to obtain postdicted estimates of earlier performance on the neuropsychological measures. The postdicted neuropsychology scores allow tests of the possibility of early identification of risk for an eventual clinical diagnosis of dementia. Results of these studies suggest that data from psychometric tests suitable for a normal community-dwelling population can predict high risk for dementia 7 and 14 years prior to the identification of cognitive impairment by neuropsychologists.