8

L1 sign language tests and assessment procedures

Tobias Haug, Wolfgang Mann, Joanna Hoskin, and Hilary Dumbrill

Introduction

Sign language tests have been developed for different purposes (Haug, 2005), ranging from monitoring deaf children’s sign language development to the assessment of sign language skills in adult learners who learn a sign language as a second or foreign language. The term “sign language test for L1 learners” is frequently used to refer to signing deaf children who acquire a sign language at home, which is different from adult learners of a sign language, that is, deaf and hearing adults who learn a sign language as a second or foreign language (e.g., Woll, 2013). However, since some adults can be considered as L1 learners, such as children of deaf adults (CODAs) and deaf adults, we use the term “young learners” or “early learners” to refer to the linguistically diverse group of deaf and hearing signing children who acquire a sign language from birth or during childhood up to 6 years old. For the purpose of this chapter, we only focus on L1 signing children. Smith, Davis, and Hoffman in Chapter 18 of this volume review the tests and assessment procedures with L2/Ln learners. Readers interested in a more detailed discussion of sign language tests are referred to Enns et al. (2016) or Haug (2005).1

Theoretical perspectives

Compared to spoken language assessments, the number of sign language tests that are (commercially) available is relatively small. Apart from the fact that sign language research is still a very young field, which only started in the 1960s, this shortage may be a result of specific challenges related to development and evaluation of sign language tests. One such challenge is the incomplete and limited state of research on the structure and acquisition on many sign languages.

For test developers, the difficulty does not arise solely from the lack of documentation of a particular sign language, but also when important resources like reference grammar (Palfreyman, Sagara, & Zeshan, 2015) or a sign language corpus are not available (e.g., Haug, 2017). For instance, if a corpus on the acquisition of sign language is available, this data can be used to inform test development in the form of frequency lists of signs as a foundation to develop a vocabulary test.

The incomplete state of research is only one possible challenge. Another challenge is the small size and heterogeneity of the deaf population. The small number of deaf children makes it difficult to obtain samples that are large enough for norming purposes, that is, generate average performance scores for different ages. Although this issue might not be a big problem in larger countries such as the United States or some European countries such as Germany, it certainly poses a problem for small countries as well as countries with more than one sign language, e.g., Switzerland with three (Boyes Braem, Haug, & Shores, 2012) or Belgium with two sign languages (Van Herreweghe & Vermeerbergen, 2009).

Another issue is the heterogeneity of a group of deaf children in terms of their language acquisition. Deaf children who do not have access to a sign language within the most critical early years of their lives (up to 6 years old; e.g., Mayberry, Lock, & Kazmi, 2002; Newport, 2002) are the main target group for sign language evaluation and intervention (Haug, 2011). The reference group, however, should be deaf and hearing (near native) signing children, most of whom come from deaf families. These children serve as models against which the performance of children with late exposure to sign (most deaf children with hearing parents) can be measured to allow for standardization (Herman, 2002; Herman, Holmes, & Woll, 1998).

Different publications (e.g., Mann & Haug, 2014) and guidelines (e.g., Haug et al., 2016) deal with the development and evaluation of sign language tests for deaf children. These publications can serve as a basis for test developers to design tests in a local context.

Evaluation of sign language tests

Once a test has been developed, piloted, and revised, and a main study with a larger sample has been conducted, the test needs to be evaluated according to specific psychometric properties, which include validity and reliability. There are a number of ways this can be accomplished, some of which will be discussed next.

In order for any (language) test measurements to be trusted, evidence needs to demonstrate that the test is valid and reliable. These psychometrics are important as they make it possible to interpret and generalize the underlying construct that a test measures. The need to report psychometric values is particularly apparent for new assessments that have not yet been standardized. The following section describes the concepts of validity and reliability with concrete examples from existing sign language tests. Since most studies on sign language tests are framed within classical test theory, our examples will be presented within this framework. In addition, we will briefly cover recent approaches to validation.

Validity

Validity of a test is the understanding that a test truly measures what it is supposed to measure (Kline, 2000). This is notably different from reliability, which determines consistency of a test. There are several types of validity, each of which is briefly described below.

Content validity

Content validity refers to the degree that the instrument covers the content that it is supposed to measure (Bush, 1985). It also refers to the adequacy of the sampling of the content that should be measured. The test content in existing sign language tests is developed by collaborations with native deaf signers or practitioners working with deaf children. For instance, for developing the items of the American Sign Language (ASL) Vocabulary Test (ASL-VT), Mann and colleagues (2016) worked closely with a panel of deaf and hearing experts. These experts provided feedback related to the multiple-choice format of the test, specifically the target and distractor items. Additional feedback on the quality and clarity of the test images was gathered by a group of hearing undergraduate learners. Mann and colleagues also used teacher ratings for the test items to evaluate the type or a combination of types of information used by children to acquire these items. Similarly, the developers of the ASL Assessment Instrument (ASLAI; Hoffmeister, 1999) worked closely with a team of native signers with expertise on language development that advised on the content for each task. In addition, each task was field-tested on a group of ten deaf adults. Only items that showed at least a 90% agreement among the deaf respondents were retained in the item pool (Hoffmeister, 1999). Evidence for content validity of a vocabulary test for German Sign Language (Deutsche Gebärdensprache, DGS; Bizer & Karl, 2002a) was provided by using word frequency lists for spoken German as a foundation for item selection within the targeted age range, which is children in third through fifth grades. Finally, Haug (2011) reviewed existing research studies on the linguistic structures of DGS that are represented in one British Sign Language (BSL) test in order to establish content validity during the process of adapting the BSL Receptive Skills Test (Herman, Holmes, & Woll, 1999) to DGS.

Construct validity

A second type of validity is construct validity. This type of validity is needed when a test measures a specific attribute or quality, for which there is no operational definition (Cronbach & Meehl, 1955). As a first step, it requires a clear definition of the construct, for example, intelligence, to be measured (Bechtoldt, 1951). The question as to which domain or construct should be measured can be determined through a review of the relevant literature, focus groups, and/or interviews (Yaghmaie, 2003). For instance, the underlying construct of the web-based ASL-VT was the assumption that two or more learners may have different knowledge about the same word or sign (Mann, Roy, & Morgan, 2016). This construct, referred to as construct of strength of form-meaning mappings, is illustrated in Table 8.1. It consists of four levels of mapping each of which represents one task in the ASL-VT. The levels range from 1 for the weakest mapping (meaning recognition) to 4 for the strongest mapping (meaning recall).

Table 8.1 Construct of strength of form-meaning mappings in ASL

Type of mapping

Task description

4.   Meaning Recall

Produce three ASL responses to a sign prompt

3.   Form Recall

Produce the target ASL sign for a picture prompt

2.   Form Recognition

Match a picture prompt with one of four ASL signs

1.   Meaning Recognition

Match a prompt in ASL with one of four pictures

Source: Mann, Roy, & Morgan, 2016.

In order to provide a developmental picture of vocabulary growth in ASL, test takers’ performances on the different tasks were correlated with age, followed by a comparison of their performances across tasks.

In the case of the vocabulary test for DGS (Perlesko: Prüfverfahren zur Erfassung Lexikalisch-Semantischer Kompetenz; Bizer & Karl, 2002a) construct validity was established by correlating intra-individual factors with the test results. These factors were, for example, (1) grade attended in school, (2) educational policy of the school (oral, sign, or bilingual), (3) hearing status of the parents, (4) gender, and (5) chronological age. The results show that the “knowledge of receptive vocabulary” construct was represented in the Perlesko. In comparison, developers of the Assessment for Sign Language for the Netherlands (Nederlandse Gebarentaal, NGT) (Hermans, Knoors, & Verhoeven, 2010) carried out three types of correlation analyses to investigate construct validity. They are correlations between test takers’ ages and test performance, between gender and test performance, and between parental hearing status and test performance. Similarly, developers of the ASL-Proficiency Assessment (ASL-PI; Maller et al., 1999), which measures expressive ASL skills in non-native learners, divided the test takers into three groups based on their different linguistic experience. They found that each group performed significantly different on the test.

An alternative approach to presenting evidence in support of construct validity was used for the Language Proficiency Profile (LPP; Bebko & McKinnon, 1993), a test that evaluates children’s overall linguistic and communicative skills, independently of any specific language or modality of expression. Each item was printed separately on a card and presented to three experts who were psycholinguists or language pathologists. They were asked to sort these items into developmental order within each subscale of the test. Results showed high agreement (84%) between raters’ ordering and the original ordering for all subscales.

Criterion-related validity

There are two types of criterion-related validity, concurrent and predictive validity. Concurrent validity is studied when “one test is proposed as a substitute for another, or a test is shown to correlate with some contemporary criterion (e.g., psychiatric diagnosis)” (Cronbach & Meehl, 1955: 282). For instance, to collect evidence for concurrent validity, the developers of the ASL-VT compared test takers’ performance on the four vocabulary tasks with their scores on the ASL Receptive Skills Test (ASL-RST; Enns & Herman, 2011). Both repeated sets of bivariate correlations and partial correlations with control for age did not show any significant differences. A closer inspection of the distribution of scores on the vocabulary tasks and ASL-RST revealed certain similarities between the ASL-VT and ASL-RST in their profiles across age bands (see Mann, Roy, & Morgan, 2016 for details).

In comparison, developers of the Perlesko (Bizer & Karl, 2002a) used teacher ratings of test takers’ vocabulary knowledge as an external variable separately for each of the three language sections of the test. These ratings were correlated with children’s test performances. There were also significantly strong correlations between the test and the tested criteria. Haug (2011) used a similar approach. He correlated the teachers’ rating of the deaf learners’ DGS skills with their raw scores on the DGS Receptive Skills Test. The results showed a strong correlation. In case of the ASL-PI (Maller et al., 1999) test takers’ performance scores were compared with their scores from two subtests of the Test Battery for ASL Morphology and Syntax (Supalla et al., 1995), namely, Verbs of Motion Production and Sign Order Comprehension, and high positive correlations were found. Bebko, Calderon, and Treder (2003) used different tests to compare to test takers’ performance on the Language Proficiency Profile depending on their ages. These included the Expressive Communication subscale of the Vineland Adaptive Behavior Scales for younger children (Sparrow, Balla, & Chicchetti, 1984) and the Bankson Language Screening Test for older children (Bankson, 1977; Bebko, Calderon, & Treder, 2003). Bebko, Calderon, and Treder (2003) found positive correlations between the scores from these tests. Hoffmeister (2000) also found positive correlations between children’s scores on the ASLAI with the Stanford Achievement Test (SAT-HI) and the Rhode Island Test of Language Structure (RITLS; Engen & Engen, 1983).

Predictive validity

Predictive validity is predicated on one or more known variables predicting the outcomes of tests, such as performance on standardized achievements tests used in schools, such as the Stanford Achievement Test (SAT-10; Pearson, 2014) and Wechsler Individual Achievement Test (WIAT-II; Wechsler, 2005). For instance, the scores of the BSL Receptive Skills Test (Herman, Holmes, & Woll, 1999) were predicted for children by their years of exposure to BSL. In the youngest age groups, children from deaf families performed better than children from hearing families. For the older age groups, there was no significant difference between native signers and deaf children from hearing families on bilingual programs, however both of these achieved significantly higher scores than deaf children from hearing families in Total Communication programs. In the latter group, those children with deaf siblings or other deaf relatives achieved higher scores than those without deaf relatives. Similarly, to gather evidence for predictive validity, the developers of the DGS Receptive Skills Test used variables such as the age of first exposure to DGS, parental hearing status, and chronological age to provide additional information that could help predict and explain performance differences (Haug, 2011).

Reliability

Reliability refers to whether the test actually measures what it is intended to measure (e.g., Rust & Golombok, 2000). Reliability is measured in different ways. The most commonly known ones are (1) stability over time and (2) internal consistency. The reliability of a test over time is known as test-retest reliability (Kline, 2000) for which subjects’ scores obtained on two different occasions are correlated. The higher the correlation, the more reliable is the test. The internal consistency of a test refers to “the degree to which scores on individual items or group of items on a test correlate with one another” (Davies et al., 1999: 86). A measure of internal consistency includes statistical procedures such as Cronbach’s alpha. A minimum value of .70 can be considered as an acceptable value for a Cronbach’s alpha (Nunnally, 1978).

Additional measures of reliability include inter-rater and intra-rater reliability. Inter-rater reliability refers to the level of agreement between two or more raters on a participant’s performance (Davies et al., 1999), for example, by video-recording a child’s language production and then comparing the scoring of specific grammatical structures by two different raters. Intra-rater reliability refers “to the extent to which a particular rater is consistent in using a proficiency scale” (Davies et al., 1999: 91) on different occasions. Intra-rater reliability can be established by comparing the rated scores of candidates that have been tested on two occasions that are, for example, a month apart (Davies et al., 1999) by the same raters.

Hoffmeister et al. (1989) established inter-rater reliability for a narrative production test, which was part of the ASLAI by using trained raters, who evaluated the signed narratives of deaf children. Inter-rater reliability was high with .90, for both deaf and hearing raters (Hoffmeister, 1999). A similar approach was used for the Test of American Sign Language (TASL) (Strong, Prinz, & Kunze, 1994). Inter-rater reliability was investigated for each subtest “that required subjective decisions by having raters score the same set of 10 protocols, reviewing and resolving disagreements, and then scoring a second set of 10 protocols. Eventual agreement was better than 96% in all cases” (Strong & Prinz, 1997: 40). Hermans, Knoors, and Verhoeven (2010) also established inter-rater reliability for the productive measures of the NGT Assessment Instrument. Of the five productive tasks, “13 test administers scored a randomly selected group of children within a particular age-group for the second time but now from videotape” (ibid.: 113). Applying a Spearman’s rank correlation coefficient (rho), the correlations between the raters ranged from .78 to .92, which is considered high. To our knowledge, there is no study that has focused specifically on intra-rater reliability.

Modern approaches to test validation

In recent decades, the argument-based approach has become the standard to validate language tests. Within this framework, validity does not include different kinds of validity, such as content, criterion-related, and construct validity, but is rather viewed as a unified concept, that is, construct validity (Kane, 1992). The core of validity is not to validate a test itself but the inferences made on the basis of test score interpretation and use (Messick, 1990). As Messick (1990) argues, “test validation is empirical evaluation of the meaning and consequences of measurement” (ibid.: 1487). Whereas reliability has been viewed as a “distinct form and a necessary condition for validity” (Chapelle, 1999: 258), validity is seen as the type of evidence in more recent views on validation (Chapelle, 1999). The process of validation includes different rationales and types of evidence that need to be collected and used to build a validity argument.

Within an argument-based framework, five basic concepts are of importance: (1) Claim is “the conclusion of arguments that we seek to justify” (Fulcher & Davidson, 2007: 164); (2) grounds are the available evidence for the claim; (3) warrant is the link between the evidence (grounds) and the claims; (4) backing is additional support for the warrant (Fulcher & Davidson, 2007), and includes, for example previous research or experience or comes from theory (e.g., Bachman, 2005); and (5) rebuttal is “a counter-claim that the warrant does not justify the step from the grounds to the claim” (Fulcher & Davidson, 2007: 165). An example for an argument-based framework for sign languages is the German Sign Language Receptive Skills Test (Haug, 2011) for children. The argument framework for multiple-choice items to assess the acquisition (comprehension) of morphological constructions in German Sign Language (DGS) in deaf children 4–11 years old (from Haug, 2016) is shown in the following:

(1)Claim: Responses to the items will lend support to claims about the acquisition process and the influence of parents’ hearing status (deaf vs. hearing) on the acquisition of DGS.

(2)Grounds: (a) Item statistics, and (b) correlation of raw scores with external variables such as chronological age of children, age of acquisition, and parental hearing status.

(3)Warrant: Sign language acquisition research has shown that (a) certain morphological constructions across sign languages are mastered, or acquired, when children are 10–12 years old, (b) other constructions that are mastered by age 6, and that (c) deaf children of deaf native signing parents acquire a sign language as their first language, compared to deaf children of hearing parents who might learn a sign language later.

(4)Backing: Review of research studies that focuses at morphological constructions in DGS and the acquisition (from emergence to mastery) of these constructions in deaf children.

(5)Rebuttal: (a) Younger children achieve higher raw scores than older children; (b) deaf children of hearing parents achieve higher raw scores than deaf children of deaf parents; (c) test items do not represent DGS constructions.

However, alternative types of statistical methods provide new standards within language testing to investigate reliability. The multi-facet Rasch measurement (Linacre, 1994) is an example. Rasch measurement “is an attempt to model the relationship between various facets of the test situation” (McNamara, 1996: 154). It can be used to investigate interactions between different facets, such as items and the raters.

Pedagogical applications

This section offers an overview of tests from different sign languages that are used by practitioners within the school context. Each test is examined on the following criteria: (1) Assessment target, such as vocabulary and grammar; (2) type of assessment, which means receptive and/or productive skills test; (3) target group, such as babies, toddlers, and children; (4) the language in which the test was originally developed; and (5) the sign language(s) for which the test has been adapted. We acknowledge that there may be other sign language tests that are not mentioned in this chapter. The reasons for not including these tests is that they may not have been published in a language that is accessible to the authors, focus on a different target population such as adults, and/or are not available to practitioners.

Assessing vocabulary in deaf children

The MacArthur-Bates Communicative Development Inventory (CDI)

Assessment target: Language development in monolingual hearing children. The CDI examines comprehension, word production, and early phases of grammar.

Format: Standardized parental checklist. Parents complete the checklist in regular time intervals by ticking off any words or signs that their child can understand and/or is able to produce.

Target population: Hearing children between 8–36 months.

Developed for: American English (CDI, Fenson et al., 1994).

Adapted for: American Sign Language (ASL; Anderson & Reilly, 2002), British Sign Language (BSL; Woolfe et al., 2010).

Perlesko: Vocabulary test for German Sign Language

Assessment target: Language development in signing children, specifically receptive vocabulary.

Format: The Perlesko uses a multiple-choice format, which requires children to either match a signed or spoken word to one of four picture choices or match a picture to one of four words in written German. In addition to German Sign Language (DGS), it can also be used to assess children’s comprehension skills in spoken German, and written German.

Target population: Deaf children between 7–13 years.

Developed for: German Sign Language (Bizer & Karl, 2002b).

Adapted for: N/A.

British Sign Language Vocabulary Test

Assessing grammatical aspects in deaf children

British Sign Language Receptive Skills Test

Assessment target: Language development, specifically receptive knowledge of the following BSL syntactic and morphological structures: (1) spatial verb morphology, (2) number and distribution, (3) negation, (4) size/shape specifiers, (5) noun-verb distinction, and (6) handling classifiers.

Format: Vocabulary check and a video-based receptive skills test. Prior to the receptive skills test, a vocabulary check is conducted. The children confirm their knowledge of the 22-item vocabulary used in the main test through a simple picture-naming task that identifies signs taken from the receptive skills test. There are two versions of this task, one for the North and one for the South of the UK.

Target population: Deaf children between 3–11 years.

Developed for: BSL (Herman, Holmes, &Woll, 1999; Herman, Rowley, & Woll, 2015).

Adapted for: ASL (Enns & Herman, 2011) and German Sign Language (Haug, 2011), Finnish Sign Language (Kanto, in progress), Polish Sign Language (Enns et al., 2016), and Spanish Sign Language (Valmaseda et al., 2013)

American Sign Language Assessment Instrument

Assessment target: Conversational abilities, academic language knowledge, language comprehension, analogical reasoning, and metalinguistic skills.

Format: The American Sign Language Assessment Instrument (ASLAI) is a web-based test consisting of 12 tasks. Items are presented in multiple-choice format. Tasks in the ASLAI are in one of six formats: 1) picture to sign, 2) sign to sign, 3) picture to picture, 4) drag-and-drop sorting, 5) response-only (grammaticality judgment), and 6) video event to sign.

Target population: Deaf children between 4–18 years.

Developed for: ASL (Hoffmeister et al., 2014).

Adapted for: N/A

Assessment Instrument for Sign Language of the Netherlands

Assessment target: Phonology, morpho-syntax, and narrative skills (both receptive and productive).

Format: This assessment instrument is a computer-based test consisting of nine tasks. Formats include multiple-choice (e.g., receptive morpho-syntactic task), or retelling a picture story shown on screen.

Target population: Deaf children between 4–12 years.

Developed for: Sign Language of the Netherlands (NGT, Hermans, Knoors, & Verhoeven, 2010).

Adapted for: N/A

British Sign Language Productive Skills Test (BSL-PST)

Nonsense Sign Repetition Task (NSRT)

Assessment target: Phonological development in sign language.

Format: The NSRT is a web-based test in which participants repeat pre-recorded, nonsense signs of differing phonetic (i.e., handshape and movement) complexity.

Target population: Deaf children between 3–11 years.

Developed for: BSL (Mann et al., 2010).

Adapted for: Icelandic Sign Language (Ivanova, 2012).

Test of American Sign Language (TASL)

Assessment target: Morpho-syntactic skills (both receptive and productive).

Format: The TASL is a video-based test consisting of six tasks. Formats include multiple-choice (e.g., classifier comprehension, map marker task) or retelling a picture story from a book without text.

Target population: Deaf children between 8–15 years.

Developed for: ASL (Strong, Prinz, & Kuntze, 1994).

Adapted for: Swedish Sign Language (Schönström, Simper-Allen, & Svartholm, 2003); French Sign Language used in Switzerland (Prinz et al., 2005).

Instrumento de avaliação da língua de sinais brasileira (IALS)

Assessment target: Morpho-syntactic skills (both receptive and productive).

Format: In the comprehension task here are a set of pictures for each level evaluated combined with a video in Libras telling a short story. The stories are produced involving different levels of vocabulary, uses of space and adding referents. The participant watches the video and chooses the pictures related to what was signed ordering the pictures considering the story told. The stories increase the complexity of matching the levels of the language developed. The second part involves production. The participant watches a short story in the video, then tells the story to someone else. This signing production is evaluated based on a chart with criteria considering the number of the events, vocabulary, uses of space, classifiers, and general vocabulary.

Target population: children from 4 to 9 years old and late learners.

Developed for: Libras (Brazilian Sign Language) (Quadros & Cruz, 2011).

Adapted for: NA.

Use of sign language assessments in practice

Whilst many assessments for sign languages have been developed over the course of the past decade, their use in everyday education and health settings raises some issues. These include accessibility, purpose, training, and intervention planning. One issue that may affect practitioners’ use of sign language assessments is that many of the existing sign language tests have been developed as part of research projects. As a consequence they tend to measure aspects of language for a purpose linked to a very specific research question. Many tests are not commercially available, suggesting that people who are interested contact the developers. (A list of existing tests is available on this website: https://signlang-assessment.info) This website is useful for researchers and academics, but limited information on test accessibility and usability is provided to practitioners in a clinic or educational setting. In addition, information about language assessment tools is readily available in academic papers and at conferences. However, these are not always the most effective means of informing practitioners working in schools and clinics. Even if the teachers, therapists, and assistants access this information, it is not always easy to convert a research tool described in academic terms into a functional procedure for use in the classroom (Hoskin, 2017).

Unfortunately, there is very limited literature on practitioners’ use of sign language assessments. In order to encourage awareness and use of the assessments in enabling children to learn language as effectively as possible, these issues need to be addressed. For many practitioners, one key role of tests is to guide intervention planning. In some educational settings, the assessment tools that have been developed are used regularly to monitor children’s progress. In other settings, the tools are used when there are concerns about a child’s language learning abilities. Without appropriate knowledge of language development and disorder, the practitioners may have difficulty in both assessing accurately and translating the results into intervention targets and strategies. Understanding how and when to use these tests with children developing sign languages can be challenging for practitioners. One example of this is a paper on “narragrams” where cartoon stories from a television show are broken into mini-events for assessing children’s narrative skills (Erber et al., 2016). However, the paper does not contain the story titles or mini event lists, requiring the practitioner to seek these from the authors. There are also settings where practitioners report that they do not use tools, although they have access to them. They are not sure how to interpret the results or how the results would be useful in developing any intervention.

As described above, researchers are continuing to develop sign language assessments and practitioners are striving to improve their assessment and intervention of children’s sign language by using these tools. They can collaborate together in creating educational assessment instruments. One way to do this could be in the form of online resources including webinars to demonstrate test use similar to those offered for many spoken language assessments such as the Clinical Evaluation and Language Fundamentals (CELF-5; Wiig, Semel, & Secord, 2013). Another way forward is for the researchers and clinicians to work together to translate a research tool or test into a functional procedure for use in the classroom.

Technology and sign language testing

The Deafness Cognition and Language Research Centre (DCAL) in the UK has addressed the above concerns about using sign language testing in practice. They made assessment tools available in a website (www.dcalportal.org) to enable practitioners to easily access the assessments that are relevant for their work with signing deaf individuals. It raises the need for local safeguard protocols as practitioners are asked to enter confidential details including name, date of birth, and schools. Schools and clinics need to ensure that their staff are not breaking confidentiality rules. This process is not without problems, as is illustrated in the following comment from a UK-based practitioner:

The confidentially issue has been the main barrier to me using the test. ... the language of the confidentiality agreement needs to be formal and detailed but I find that ‘off putting’, and sometimes inaccessible, to parents/guardians.

Dumbrill, 2018

As more assessments for spoken language become accessible online, for example, the CELF-5, and practitioners become accustomed to using online tools, some of the issues discussed above may be resolved. As practitioners grow more accustomed to incorporate online tools in their assessment routine, this is likely to lead to be a broader understanding of how to use the online format and how to deal with issues around confidentiality. In turn, this will raise manager and supervisor awareness of issues related to training, funding, and supervision in the use of online assessment tools.

Still another issue in sign language testing is test delivery. The use of computer- or web-based test formats for delivery is an obvious benefit for sign language tests (e.g., Haug, 2015). Web-based tests are particularly useful, as they allow automatic scoring of multiple-choice tests and are easily accessible from anywhere in the world where high-speed Internet is available (Haug, 2015). The shift from the traditional paper and pencil format of testing towards web-based formats for test delivery has reached the field of sign language assessment, as demonstrated, for example, by the BSL Vocabulary Test (Mann & Marshall, 2012) and the BSL Receptive Skills Test (Herman, Holmes, & Woll, 1999). In addition, these tests are part of a web-based assessment portal set up by the Deafness, Cognition and Research Centre at University College London. Another example of using a web-based format for sign language assessment is the narrative comprehension test for Swiss German Sign Language (Deutschschweizerische Gebärdensprache, DSGS; Haug & Perrollaz, 2015), which has been developed within the frame of the EU project SignMET. Similar to the BSL-VT and BSL-RST, this test is integrated in a purpose-build portal for sign language tests.2 One of the biggest disadvantages of web-based tests, according to Haug (2015), is difficulty with the technical infrastructure, such as old hardware and software, and server connectivity. It is not always the case that test takers are familiar with the use of a computer or mobile device. A test taker’s level of computer familiarity might have an impact on the test results (for a detailed discussion on web- and mobile-based testing formats, see Haug, 2015).

There are other technologies that show potential use for sign language assessment. One is automatic sign language recognition. A recent Swiss National Science Foundation project SMILE33 (Ebling et al., 2018) makes use of this technology. One of the goals of the SMILE project is to develop an automatic sign language recognition system that will be used in the context of vocabulary assessment for adult L2 learners of DSGS. The setup for such a testing scenario is that a test taker will be asked to produce a lexical sign (e.g., with a gloss or translating a written German word into DSGS) delivered on a computer screen. The produced sign will be captured by a camera, recognized by a sign language recognition system, compared to the test taker’s performance with the “correct” or “acceptable” form of the sign, and feedback provided to the learner.

Future trends

Future research studies

We focus on two different strands of future research studies. The first research strand is the development and evaluation of rating scales of productive sign language tests. Some of the above-reviewed sign language tests include rating scales of production (for example, Strong & Prinz, 1997). Issues like inter-rater reliability have been investigated. However, studies are needed to ascertain the processes in which raters come to have a mutual understanding of the rating criteria and the underlying construct of the rating scale, and to solve disagreements in their ratings. Findings from these studies may help strengthen the validity of the rating scale. The second research strand includes studies on the use of new technologies in sign language testing. Future studies should also look into new sign language technologies such as the sign language recognition (SLR) system and the SMILE project, which uses SLR, and explore how they could be applied in sign language assessment.

Future pedagogical applications

As mentioned earlier, a few sign language tests for children are commercially available. The need for such tests in schools has been pointed out in different studies (e.g., Haug & Hintermair, 2003). An area in need is training current and future teachers of the deaf, and sign language teachers, tutors, and practitioners on how to use the tests with regard to administration, score interpretation, and pedagogical implications. Finally, despite the value of tests for professionals working with deaf individuals, the existing sign language tests focus on the child’s learning outcome rather than the learning process. This calls for alternative methods of testing. Dynamic assessment, which enables practitioners to make assumptions about children’s response to a particular type of intervention, is an example. The research done in this area, while limited (Mann, 2017; Mann, Peña, & Morgan, 2014, 2015), looks promising.

Notes

References

Anderson, D., & Reilly, J. (2002). The MacArthur communicative development inventory: Normative data for American Sign Language. Journal of Deaf Studies and Deaf Education, 7 (2), 83–106.

Bankson, N.W. (1977). Bankson Language Screening Test. Baltimore, MD: University Park Press.

Bebko, J.M., Calderon, R., & Treder, R. (2003). The Language Proficiency Profile-2: Assessment of the global communication skills of deaf children across languages and modalities of expression. Journal of Deaf Studies and Deaf Education, 8 (4), 438–51.

Bebko, J.M., & McKinnon, E.E. (1993). The Language Proficiency Profile-2. Unpublished assessment tool. York University, Toronto, Canada.

Bachman, L.F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2 (1), 1–34.

Bechtoldt, H.P. (1951). Selection. In S.S. Stevens (ed.), Handbook of Experimental Psychology (pp. 1237–67). New York: Wiley.

Bizer, S., & Karl, A.-K. (2002a). Entwicklung eines Wortschatztests für gehörlose Kinder im Grundschulalter in Gebärden, Schrift- und Lautsprache. Unpublished doctoral dissertation. Fachbereich Erziehungswissenschaften, Universität Hamburg.

Bizer, S., & Karl, A.-K. (2002b). Perlesko: Prüfverfahren zur Erfassung lexikalisch-semantischer Kompetenz gehörloser Kinder im Grundschulalter. Unpublished test.

Boyes Braem, P., Haug, T., & Shores, P. (2012). Gebärdenspracharbeit in der Schweiz: Rückblick und Ausblick. Das Zeichen, 90, 58–74.

Bush, C.T. (1985). Nursing Research. Virginia: Reston Publishing Company.

Chapelle, C.A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–72.

Cronbach, L.J., & Meehl, P.E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52 (4), 281–302.

Davies, A., Brown, A., Elder, C., Hill, K., Lumley, T., & McNamara, T. (1999). Dictionary of Language TestingStudies in Language Testing 7. Cambridge: Cambridge University Press.

Dumbrill, H. (2018). Personal communication. September 2, 2018.

Ebling, S., Camgöz, N.C., Boyes Braem, P., Tissi, K., Sidler-Miserez, S., Stoll, S., Magimai-Doss, M. (2018). SMILE Swiss German Sign Language data set. 11th Language Resources and Evaluation Conference (LREC 2018) (pp. 4221–9).

Engen, E., & Engen, T. (1983). Rhode Island Test of Language Structure. Austin, TX: Pro-Ed, Incorporated.

Enns, C., Boudreault, P., Zimmer, K., Broszeit, C., & Goertz, D. (2014). Assessing Children’s Expressive Skills in American Sign Language. Paper presented at the Annual Meeting of the Association of College Educators – Deaf/Hard of Hearing. Washington, DC.

Enns, C., Haug, T., Herman, R., Hoffmeister, R.J., Mann, W., & Mcquarrie, L. (2016). Exploring signed language assessment tools in Europe and North America. In M. Marschark, V. Lampropoulou, & E.K. Skordilis (eds.), Diversity in Deaf Education (pp. 171–218). Oxford: Oxford University Press.

Enns, C., & Herman, R. (2011). Adapting the assessing British Sign Language development: Receptive Skills Test into American Sign Language. Journal of Deaf Studies and Deaf Education, 16 (3), 362–74.

Erber, N.P., Grant, L.M., Leigh, K., & Kenfield, S. (2016). The narragram: A new way to describe children’s recall of stories. Deafness & Education International, 18 (3), 120–3.

Fenson, L., Dale, P.S., Reznick, J.S., Bates, E., Thal, D.J., Pethick, S.J., & Stiles, J. (1994). Variability in early communicative development. Monographs of the Society for Research in Child Development, i-185.

Fulcher, G., & Davidson, F. (2007). Language Testing and Assessment: An Advanced Resource Book. New York: Routledge.

Haug, T. (2005). Review of sign language assessment instruments. Sign Language & Linguistics, 8 (1/2), 61–98.

Haug, T. (2011). Adaptation and Evaluation of a German Sign Language Test – A Computer-based Receptive Skills Test for Deaf Children Ages 4–8 Years Old. Hamburg: Hamburg University Press. Retrieved June 17, 2018 from http://hup.sub.uni-hamburg.de/purl/HamburgUP_Haug_Adaption

Haug, T. (2015). Use of information and communication technologies in sign language test development: Results of an international survey. Deafness & Education International, 17 (1), 33–48.

Haug, T. (2016). Example of an Argument Structure for Multiple-choice Items to Assess the Acquisition (Comprehension) of Morphological Constructions in German Sign Language (DGS) in Deaf Children 4–11 Years Old. Unpublished material.

Haug, T. (2017). Web-based Sign Language Assessment: Challenges and Innovations. Paper presented at the ALTE 6th International Conference: Learning and Assessment: Making the Connection, Bologna.

Haug, T., & Hintermair, M. (2003). Ermittlung des Bedarfs von Gebärdensprachtests für gehörlose Kinder Ergebnisse einer Pilotstudie. Das Zeichen, 64, 220–9.

Haug, T., Mann, W., Boers-Visker, E., Contreras, J., Enns, C., & Rowley, K. (2016). Guidelines for sign language test development, evaluation, and use. Retrieved on July 25, 2017 from www.signlang-assessment.info/tl_files/signlanguage/guidelines_sign_language_tests_haugetal_v1_2016-11-10.pdf

Haug, T., & Perrollaz, R. (2016). EU-Projekt SignMET: Gebärdensprachtests im Test. Sonos, 3, 8–10.

Herman, R. (2002). Assessment of BSL Development. Unpublished doctoral dissertation. City University London.

Herman, R., Holmes, S., & Woll, B. (1998). Design and Standardisation of an Assessment of British Sign Language Development for Use with Young Deaf Children: Final Report, 1998. Unpublished manuscript, City University London, UK.

Herman, R., Holmes, S., & Woll, B. (1999). Assessing BSL Development – Receptive Skills Test. Coleford, UK: The Forest Bookshop.

Herman, R., Rowley, K., & Woll, B. (2015) BSL Receptive Skills Test (RST) online. Retrieved on July 25, 2017 from https://dcalportal.org/tests

Hermans, D., Knoors, H., & Verhoeven, L. (2010). Assessment of sign language development: The case of deaf children in the Netherlands. Journal of Deaf Studies and Deaf Education, 15 (2), 107–19.

Hodge, G., Schembri, A. & Rogers, I. (2014) The Auslan (Australian Sign Language) Production Skills Test: Responding to Challenges in the Assessment of Deaf Children’s Signed Language Proficiency. Paper presented at Disability Studies in Education Conference, July, Melbourne, Australia.

Hoffmeister, R.J. (1999). American Sign Language Assessment Instrument (ASLAI). Unpublished manuscript, Center for the Study of Communication and the Deaf, Boston University.

Hoffmeister, R.J. (2000). A piece of the puzzle: ASL and reading comprehension in deaf children. In C. Chamberlain, J.P. Morford & R. Mayberry (eds.), Language Acquisition by Eye (pp. 143–63). Mahwah, NJ: Lawrence Erlbaum Publishers.

Hoffmeister, R.J., Caldwell-Harris, C.L., Henner, J., Benedict, R., Fish, S., Rosenburg, P. Conlin-Luippold, F., & Novogrodsky, R. (2014). The American Sign Language Assessment Instrument (ASLAI): Progress Report and Preliminary Findings. Working paper: Center for the Study of Communication and the Deaf, Boston University.

Hoffmeister, R., Greenwald, J., Bahan, B., & Cole, J. (1989). The American Sign Language Assessment Instrument. Unpublished instrument. Center for the Study of Communication and the Deaf, Boston University.

Hoskin, J. (2017). Language Therapy in British Sign Language: A Study Exploring the Use of Therapeutic Strategies and Resources by Deaf Adults Working with Young People Who Have Language Learning Difficulties in British Sign Language (BSL). Unpublished doctoral dissertation. University College London.

Ivanova, N. (2012). Nonsense Repetition Test for ITM. Reykjavik, Iceland: The Communication Centre for the Deaf and Hard of Hearing.

Jones, A., Herman, R., Botting, N., Marshall, C., Toscano, E., & Morgan, G. (2015). Narrative Skills in Deaf Children: Assessing Signed and Spoken Modalities with the Same Test. Paper presented at the SRCD preconference: Development of deaf children. Philadelphia, PA.

Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112 (3), 527–35.

Kanto, L. (in progress): Finnish Sign Language Receptive Skills Test. Unpublished test.

Kanto, L., & Mann, W. (in preparation). Finnish Sign Language Vocabulary Test. Unpublished test.

Kline, P. (2000). Handbook of Psychological Testing (2nd ed.). London & New York: Routledge.

Linacre, J.M. (1994). Many-facet Rasch Measurement. Chicago, IL: MESA Press.

Maller, S., Singleton, J.L., Supalla, S.J., & Wix, T. (1999). The development and psychometric properties of the American Sign Language proficiency assessment (ASL-PA). Journal of Deaf Studies and Deaf Education, 4 (4), 249–69.

Mann, W. (2017). Measuring deaf learners’ language progress in school. In M. Marschark, & H. Knoors (eds.), Evidence-based Practice in Deaf Education (pp. 171–90). New York: Oxford University Press.

Mann, W., & Haug, T. (2014). Mapping out guidelines for the development and use of sign language assessment: Some critical issues, comments and suggestions. In D. Quinto-Pozos (ed.), Mulitilingual Aspects of Signed Language Communication and Disorders (pp. 123–39). Bristol, UK: Multilingual Matters.

Mann, W., & Marshall, C. (2012). Investigating deaf children’s vocabulary knowledge in British Sign Language. Language Learning, 62 (4), 1024–51.

Mann, W., Marshall, C.R., Mason, K., & Morgan, G. (2010). The acquisition of sign language: The impact of phonetic complexity on phonology. Language Learning and Development, 6 (1), 60–86.

Mann, W., Peña, E.D., & Morgan, G. (2015). Child modifiability as a predictor of language abilities in deaf children who use American Sign Language. American Journal of Speech-Language Pathology, 24 (3), 374–85.

Mann, W., Peña, E.D., & Morgan, G. (2014). Exploring the use of dynamic language assessment with deaf children, who use American Sign Language: Two case studies. Journal of Communication Disorders, 52, 16–30.

Mann, W., Roy, P., & Marshall, C. (2013). A look at the other 90 per cent: Investigating British Sign Language vocabulary knowledge in deaf children from different language learning backgrounds. Deafness & Education International, 15 (2), 91–116.

Mann, W., Roy, P., & Morgan, G. (2016). Adaptation of a vocabulary test from British Sign Language to American Sign Language. Language Testing, 33 (1), 3–22.

Mayberry, R.I., Lock, E., & Kazmi, H. (2002). Development: Linguistic ability and early language exposure. Nature, 417, 38.

McNamara, T. (1996). Measuring Second Language Performance. London: Longman.

Messick, S. (1990). Validity of test interpretation and use. ETS Research Report Series, 1990 (1), 1487–95.

Newport, E.L. (2002). Critical periods in language development. In L. Nadel (ed.), Encyclopedia of Cognitive Science (pp. 737–40). London: Macmillan Publishers Ltd.

Nunnally, J.C. (1978). Psychometric Theory (2nd ed.). New York: McGraw-Hill.

Palfreyman, N., Sagara, K., & Zeshan, U. (2015). Methods in carrying out language typological research. In E. Orfanidou, B. Woll, & G. Morgan (eds.), Research Methods in Sign Language Studies: A Practical Guide (pp. 173–92). Chichester, UK: Wiley Blackwell.

Pearson Assessment (2014). Stanford Achievement Test. 10th ed. (SAT-10). New York: Pearson Assessment.

Prinz, P.M., Niederberger, N., Gargani, J., & Mann, W. (2005, July). Cross-linguistic and Cross-cultural Issues in the Development of Tests of Sign Language: The Case of the Test of American Sign Language (TASL) and Test de Langue des Signes Francaise (TELSF). Paper presented at the IASCL, Berlin.

Quadros, R.M. de, & Cruz, C.R. (2011) Instrumento de avaliação: língua de sinais. Porto Alegre: Editora ArtMed.

Rust, J., & Golombok, S. (2000). Modern Psychometrics – The Science of Psychological Assessment (2nd ed.). London & New York: Routledge.

Schönström, K., Simper-Allen, P., & Svartholm, K. (2003). Assessment of signing skills in school-aged deaf learners in Sweden. In European Days of Deaf Education, May 8–11, 2003, Örebro, Sweden (pp. 88–95). Örebro, Sweden: Stockholm University.

Sparrow, S.S., Balla, D.A., Cicchetti, D.V., Harrison, P.L., & Doll, E.A. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American Guidance Service.

Strong, M., & Prinz, P. (1997). A study of the relationship between American Sign Language and English literacy. Journal of Deaf Studies and Deaf Education, 2 (1), 37–46.

Strong, M., Prinz, P., & Kuntze, M. (1994). The Test of ASL. Unpublished test. San Francisco: San Francisco State University, California Research Institute.

Supalla, T., Newport, E., Singleton, J.L., Supalla, S.J., Metlay, D., & Coulter, G. (1995). An Overview of the Test Battery for American Sign Language Morphology and Syntax. Paper presented at the Annual Meeting of the American Educational Research Association (AERA), San Francisco, CA.

Valmaseda, M., Pérez, M., Herman, R., Ramírez, N., & Montero, I. (2013) Evaluación dla competencia gramatical en LSE: Proceso de adaptación del BSL Receptive Skill Test (test de habilidades receptivas). Retrieved on July 30, 2018 from www.cnlse.es/sites/default/files/Evaluacion%20de%20la%20competencia%20grammatical%20en%20LSE.pdf

Van Herreweghe, M., & Vermeerbergen, M. (2009). Flemish Sign Language standardisation. Current Issues in Language Planning, 10 (3), 308–26.

Wechsler Individual Achievement Test (WIAT II) (2005). London: The Psychological Corp.

Wiig, E., Semel, E., & Secord, W.A. (2013). Clinical Evaluation and Language Fundamentals (5th ed.) (CELF-5). Bloomington, MD: Pearson Clinical.

Woll, B. (2013). Second language acquisition of sign language. In C.A. Chapelle (ed.), The Encyclopedia of Applied Linguistics. Oxford, UK: Blackwell Publishing Ltd.

Woolfe, T., Herman, R., Roy, P., & Woll, B. (2010). Early vocabulary development in deaf native signers: A British Sign Language adaptation of the communicative development inventories. Journal of Child Psychology and Psychiatry, 51 (3), 322–31.

Yaghmaei, F. (2003). Content validity and its estimation. Journal of Medical Education, 3 (1), 25–7.