17 L2/Ln sign language tests and assessment procedures

David H. Smith, Jeffrey E. Davis, and Dan Hoffman

Introduction

The number of L2 learners in secondary and postsecondary education has grown exponentially over the past two decades (e.g., Wilcox, 2018; Rosen 2015 for the US; Leeson et al., 2018 for the EU countries). There is growth of the use of sign languages by professionals either to teach, communicate directly with deaf individuals, or for interpreting. Given the widespread popularity of learning sign languages (sign language) as a second and additional language (L2/Ln), and the growth in the use of sign language by the professionals, it has become more necessary to evaluate the sign language competency of L2/Ln learners. These evaluation approaches range from informal observations conducted by L2/Ln classroom instructors to rigorous formal examinations. Some of the L2/Ln sign language tests and assessments are developed for linguistic research purposes. Other L2/Ln sign language tests and assessments are developed for qualifying and credentialing individuals with graduating diplomas and professional certifications. This chapter focuses on the latter group of L2/Ln sign language tests and assessments. We cover the assessment tests that are informal and formal, receptive and expressive, and formative and summative. More specifically, we look at the issues of validity and reliability. A few of the tests that are developed for the purpose of assessing L2/Ln sign language learners can also be used with L1 native users. For references on L1 instruments, the reader is referred to the Haug et al. chapter in this volume.

Theoretical perspectives

As noted by Norris and Ortega (2012), the choice of testing and assessment protocols in L2/Ln should be guided by four concerns. They are (a) the target population, or who gets assessed in terms of the clearly specified learners; (b) the purpose(s) of assessment, or why we investigate these populations; (c) the domain(s) of assessments, or what gets assessed in terms of the knowledge constructs we want to know about; and (d) the format, measures, and psychometrics, or how to assess including the ways of acquiring and examining data related to the constructs. We use the Norris and Ortega’s approach to examine the L2/Ln sign language tests and assessments.

Who we assess: the target population

There are L2/Ln groups that are regularly evaluated for their skills and knowledge of sign languages. They are educational learners, sign language teachers, teachers of the deaf, and sign language interpreters. The assessments that test the knowledge and skills in sign language and sign language pedagogy are governed by the standards for sign language knowledge and skills that are developed by different governmental and professional entities. For instance, the standards for L2/Ln learners are developed in the US by the American Sign Language Teacher Association (ASLTA) based on the standards developed by the American Council of the Teaching of Foreign Languages (ACTFL) (Ashton et al., 2014) for ASL, and the ProSign project based on the Common European Framework of References for Languages (CEFR) for the different sign languages of the European Union (EU) countries (Leeson et al., 2016). Individuals who aspire to become sign language teachers, whether for L1 or L2/Ln learners, are subject to the same standards for teacher licensure in sign language that are developed by governmental certifying entities for all content areas. Aside from the pedagogical skills they are expected to learn, beginning teachers are also required to take teaching assessments to demonstrate their knowledge and skills demanded by their profession. Regarding teachers of the deaf, we see a trend towards bilingual education and the use of natural sign languages in classrooms. There are a growing number of teacher training programs that require teacher candidates to show proficiency before graduation, although few programs have any minimum sign language requirements prior to admission (Beal-Alvarez & Sheetz, 2015). Sign language interpreters, given their critical roles as facilitators between sign language and spoken language users, are an obvious group in need of assessment. There are standard assessments and formal certification or licensure requirements in place in several countries (Napier, 2004), and The World Association of Sign Language Interpreters (WASLI) Country Reports (2015) reported that a growing number of other countries have interpreting associations that grant certifications and licensures. There are currently no standards in sign language knowledge and skills for other practitioners that work with the deaf, such as mental health and rehabilitation professionals, although sign language knowledge and skills are essential for these professionals to have effective communication with signing deaf clientele.

Why we assess: reasons for assessment

Sign language assessments are crucial. Without sign language assessments, the consequences of having teachers who are L2 learners with less than optimal sign language skills could result in language deprivation and/or gaps in learner knowledge. Likewise, interpreters whose lack of or poorly demonstrated knowledge and skills in sign languages may have grave consequences for signing deaf people particularly in a criminal court hearing or in an emergency room (Holcomb & Smith, 2018). Given the possible negative outcomes, it is appropriate that the assessments being given are high stakes for the L2/Ln learners taking them. The reasons for sign language assessments differ for each constituencies of sign language pedagogy. The assessments are given to ensure that learners have skills in communication using sign languages, teachers have the knowledge and skills to teach and assess L2/Ln sign languages, and interpreters have the knowledge and skills to interpret sign languages. Sign language assessment also aid in the diagnosis, planning, feedback, and ascertaining qualifications for degrees, certifications, and licensures (Napier & Leeson, 2016).

What we assess: domains of assessment

The “what” of assessment addresses the domain areas. The domain areas covered in sign language assessments are distinct for the different constituencies of sign language pedagogy. The sign language assessments for the learners, some of whom become teachers and interpreters, test their expressive, receptive, and discourse skills in sign languages. The domains of language include linguistic forms such as phonology, morphology, morphosyntax, syntax, semantics, discourse, and pragmatics, and functions such as the use of language in an appropriate manner depending on the different situations and contexts that shape form. Form entails specific grammatical features, and function entails judgments of the overall ability to deliver a message with an appropriate meaning, depending on the audience and context (Harris, 2017). The assessments for teachers test their knowledge and skills in the areas of sign language pedagogy, including preparing curriculum, lesson plans, instructional strategies, instructional materials, and evaluation of learner progress. Teachers and teaching programs should follow the cycle of reflective practice of planning, implementation, evaluation, reflection, and revision process (e.g., Farrell, 2014), and the feedback that they obtain from the direct assessment of L2 learners is a significant part of this process and may aid in modifying different areas in sign language pedagogy. For interpreters, the sign language assessments test their ability to translate from a sign language to a spoken language, and vice versa, and to follow professional code of ethics.

How we assess: assessment protocols

For our purposes here, the “how” of assessment is the protocols in the format, data solicitation, times, and scoring of the tests. They pertain to the standardization of test items, the means of transmittal of stimuli and responses, the timeline to administer the tests, and the issue of neutrality in scoring.

Test format: informal and formal

Assessments may be either formal or informal. They differ in the standardization of test items and scoring systems. Formal assessment tests are standardized measures that are either normative or criterial. Norm-referenced measures are based on a collection of scores from a large sample population, and a subject’s scores are compared against the scores from the sample population. Criterion-referenced measures are based on the objectives of a curriculum and whether or not a learner met the objectives by the end of a course. Formal assessments include quizzes, assignments, or projects that are evaluated and documented as evidence of increased or lack of skills. They are the instruments that focus on a specific parameter such as vocabulary production or comprehension, or sentence comprehension or reproduction. Informal assessment tests are not standardized measures but refer to the spontaneous on-the-spot evaluation given by assessors, who may include instructors, as they observe the learners using the language. An example is proficiency interviews, through which grammar, vocabulary, accent/production, fluency, and/or comprehension can be assessed.

Test data solicitation: receptive and expressive

Assessment tests differ in how data is solicited from the subjects. Some tests assess only receptive skills, and other tests assess only expressive skills. Receptive tests assess subjects’ language comprehension, and expressive tests assess their language production. Receptive tests typically entail a subject watching a fluent speaker signing a phrase, sentence, or a story, and responding either by writing, choosing from a list of possible answers, or signing what they observed. These protocols tend to provide more objective measures since the set of expected responses is usually narrow. In expressive tests the subjects are asked to respond to a given question of which stimuli may consist of either signed utterances on videos, or pictures and images of people, things, places, and activities. The latter type is more subjective due to variations in the coherence of signed responses, which require a judgment call from raters.

Times of test: formative and summative

There are different assessment tests that are given to subjects at certain points in their learning of sign languages. They are formative and summative tests. Formative assessment tests are measures of learner progress and given periodically over time. Informal assessment tests are usually formative evaluations of learners’ mastery that are given at different temporal points. They are usually given for the purpose of obtaining information and feedback for pedagogical purposes, and determining if test results have a positive effect or “washback” on instruction (Smith & Davis, 2014). Formative assessment of learners requires that a valid and reliable relationship has been verified between a skill and overall knowledge (concurrent validity). Summative assessment tests are measures of learners’ mastery of the domain areas that are covered in the tests and are given once. While formal assessment tests are usually summative evaluations of language knowledge and proficiency, summative tests can be either informal or formal.

Test scoring: issue of neutrality

Assessment tests vary in protocols for the scoring of subject responses. Subject responses in some assessment tests are objective, and the responses in other tests are subjective. The assessment tests that are objective tests require an either right or wrong response, such as multiple choice or matching questions. No room for interpretation is given. The assessment tests that are subjective usually contain open-ended questions and rely on broader responses to the questions. Such responses are subject to a judgment call from raters that may or may not be based on a rubric, but largely based on raters’ perceptions and experiences.

Considerations for the selection of assessment tests and procedures

Considerations in the selection of sign language assessment include its psychometrics and feasibility. The psychometric properties of assessments are validity and reliability. A test is considered valid if it accurately measures the domain areas that the tests are assessing. A test is reliable if it provides consistently accurate results within and between subjects and at one and different times. We will not go into detail here on validity and reliability; Haug and colleagues in this volume discuss these two factors in depth. The feasibility of an instrument is its practicality in implementation. This is usually described in terms of ease or difficulty of implementation and the amount of time required to compile the results. If an assessment is deemed valid and reliable, but is complicated in delivery and requires much time to evaluate or score the results, then one would need to decide if it is worthwhile to use. An instructor with a large course load is unlikely to see the value in a test that is overly time-consuming. On the other hand, a linguistic researcher might be willing to disregard the time factor in favor of an accurate measure of a construct. One of the primary reasons it has taken so long for the development of available assessments is due to the time, effort and number of developers, raters, and participants needed to establish the validity, reliability, and feasibility of instruments.

Pedagogical applications

The above aspects of assessment tests and procedures provide the framework from which we review the different L2/Ln sign language assessment instruments and procedures. In the following, we identify the target population, the format of the test, who does the assessment and how they do it, and psychometric information that is available for the instrument.

Assessments for L2/Ln sign language learners

The rise in L2/Ln sign language learning at schools, colleges, and universities creates the need to test learners for foreign or world language credits, which may be applied to meet the requirements for admission and/or graduation. There are informal and formal assessments of the learners and they are examined below.

Informal assessments

Classroom-based assessments

Instructor-used assessments in classrooms cover a range of topics that are being taught and there are different assessment tests and procedures that are used for this purpose (Napier & Leeson, 2016). The classroom-based assessments include formative measures such as having learners create short vlogs or stories in response to a prompt with embedded scoring rubrics and video or written feedback. The assessments can be expressive and/or receptive. In expressive and receptive tests, learners observe sign language narratives and either respond by signing or choosing an answer to multiple-choice questions. The assessments also include summative evaluation of material taught at the end of the course. If an assessment accomplishes what it is intended to do, that is, it is valid and reliable, instructors and programs should be able to accurately determine the level of knowledge and proficiency of L2/Ln users of sign languages.

Technology for the recording and evaluation of signed utterances on video is increasingly used in the assessment process. Holistic evaluations of sign language are time-consuming, and there are programs and software that streamline the process. There are learning management programs and software currently available (e.g., Blackboard, Moodle, Canvas, D2L, GoReact) with embedded video capability and ability to load learner prompts and rubrics within assessments. Technology-based assessments can either be formal or informal, formative or summative, and objective or subjective. The learners in classrooms are able to take assessments in their classrooms and other spaces such as the library and homes, and can be monitored remotely. Instructors open the videos and score directly within the program, and their learners get their results as soon as they are assessed. This protocol makes technology-based testing more efficient and feasible, but may not necessarily improve the validity and reliability of the assessment test. Instructor-developed formal measures are often haphazard without regard to validity or reliability (Mertler, 1999). Rubrics as well as rater and technology training may affect the quality of the in-classroom assessments.

Formal assessments

Expressive L2 sign language assessments

The most widely used expressive assessments of L2 sign language learners are based on the Oral Proficiency Interview (OPI) developed in the 1950s by the Foreign Service Institute of the US Department of State (Liskin-Gasparro, 2003). The OPI has been adapted to many world languages (ACTFL, 2019), including sign languages.

Sign Language Proficiency Interview: American Sign Language (SLPI: ASL)

In the early 1980s, in consultation with the Educational Testing Service in the US, Caccamise and Newell at the National Technical Institute for the Deaf (NTID) developed a Sign Language Proficiency Interview for American Sign Language (SLPI: ASL) (Caccamise & Newell, 2007). It is used in over 50 academic and vocational rehabilitation programs across the US and in Canada, Kenya, and Ghana (Cagle, 2009), and South Africa (National Technical Institute for the Deaf, 2019).

Target population ‒ the SLPI: ASL was originally developed to assess sign language skills of faculty teaching at NTID. It was later adopted by a number of US schools for the deaf and rehabilitation agencies to assess the skills of teachers, counselors, and other professionals.

Target measures ‒ the SLPI: ASL ratings are based on a 11-step rating scale, ranging from No Functional Skills to Superior, based on the sign language proficiency of highly skilled native signers. Table 17.1 shows the SLPI: ASL rating scale. Raters assess the function or overall ability to receive and convey messages appropriately according to the levels previously mentioned. Form is also assessed and includes: vocabulary knowledge, production and fluency, and use of a range ASL grammatical features, including linguistic parameters, word order, and classifiers (or depicting signs). Receptive comprehension is also evaluated.

Format of the test ‒ the SLPI: ASL is conducted as a 20-minute conversation between a native or native-like sign language user and the test subject. Questions center on these topic areas: what the subject does in their daily lives such as work or school, family life, and personal interests and hobbies. This conversation is video-recorded for later viewing by the raters.

How measures are assessed ‒ raters work together in teams of two or three and follow a rater worksheet. They do an independent rating of initial overall assessment of the subjects’ functional skills. If the raters agree within two levels of each other on the functional skills, then they proceed to independently assess the sign language forms. Raters are expected to come to an unanimous agreement on the same level of proficiency after discussion, or to independently reassess. If no agreement is reached, a third rater is used. Any subsequent lack of further agreement invalidates the interview.

Psychometrics ‒ as a subjective measure, the SLPI: ASL has questionable validity and reliability. Bochner, Garrison, & Doherty (2015) reported that the SLPI: ASL contains no independent objectives; that is, valid and reliable measures for comparison. Subjects’ responses are shaped by interviewers, scored on a qualitative rubric, and are susceptible to rater bias, which is a major issue of SLPI: ASL’s validity (Bochner, Garrison, & Doherty, 2015). One psychometric study has been published, and it found good inter-rater reliability and limited evidence of construct validity (Caccamise & Samar, 2009). Test results can be affected by the diversity in background and experience of raters and interviewers, as well as the fidelity in using the SLPI: ASL due to regional variations in its implementation among different test sites. Nonetheless, Bochner, Garrison, & Doherty (2015) argued that the SLPI: ASL is still considered an important heuristic assessment tool for progress and can provide feedback for the purpose of improving the ASL skills of L2 learners.

American Sign Language Proficiency Interview (ASLPI)

The American Sign Language Proficiency Interview (ASLPI) was developed by Mel Carter in the 1980s based on Oral Proficiency Interview (OPI), and conferred to Gallaudet University in 2008, which currently is its sole administrator (Gallaudet University, 2019a; Caccamise & Newell, 2007).

Target population ‒ the ASLPI is utilized by agencies, schools, universities, programs, and employers to evaluate L2 and some L1 learners to assess the skills of those desiring to work professionally with deaf individuals, including teachers, interpreters, and counselors. This assessment is also used by the Educational Testing Services (ETS), a private nonprofit educational testing and assessment organization in the US, as an ASL teaching licensure examination in some US states.

Target measures ‒ the ASLPI measures expressive and receptive skills in ASL, including evaluating the accuracy, consistency, complexity, and flexibility of the signer’s language forms and functions, to determine ASL proficiency.

Format of test ‒ the assessment contains a 20-minute face-to-face conversational interview that is video-recorded in-person or through a live video feed.

How measures are assessed ‒ this is a subjective, criterion- and rater-based assessment. It contains five domain areas, which are vocabulary, grammar, accent/production, fluency, and comprehension. Ratings are numerical that ranges from 0 (no functional skills) to 5 (fully proficient). Like the SLPI: ASL, in the ASLPI three raters evaluate the criterion noted above and need to come to a unanimous agreement on the proficiency level, otherwise different raters would need to re-evaluate the video (Gallaudet University, 2019b).

Psychometrics ‒ the ASLPI is subject to similar validity and reliability issues that have been expressed for the SLPI: ASL. That is, subjective measures of language proficiency based on a single interview and the test results from raters have raised questions of validity, reliability, and interpretation (Chalhoub-Deville & Fulcher, 2003). Gallaudet University addressed these issues in 2013 by mandating and prioritizing validity and reliability research on the ASLPI using an independent outside organization. Their research studies investigate and document validity and reliability; identify best-practices and standards for language assessments that can be applied to the ASLPI; identify the gap between current and desired states, including recommendations to improve the ASLPI process and psychometrics; and (re)training raters to ensure high agreement that approaches 90 percent for initial ratings (Gallaudet University, 2019b).

NGT (Nederlandse Gebarentaal) Functional Assessment (NFA)

The NFA was developed based on the SLPI: ASL beginning in 2011 for use with Dutch Sign Language (NGT) by Eveline Boers-Visker and Beppie van den Bogaerde in consultation with Geoff Poor, NTID’s SLPI: ASL Coordinator (see Boers-Visker et al., 2015; van den Broek-Laven et al., 2014; Poor, 2011).

Target population ‒ the initial subjects were learners in the Interpreter and Teacher Training Program at Utrecht University of Applied Sciences, one of the largest educational institutions in the Netherlands.

Target measures ‒ in 2010 the Utrecht program developed NFA to align with the new curriculum that is drawn from CEFR, which requires an assessment of functional communication skills, and also aligned with the CEFR proficiency levels. The rater forms were adapted based on NGT grammar (Boers-Visker, Poor, & van den Bogaerde, 2015).

Format of test ‒ the NFA uses the same 20-minute interview procedure as the SLPI: ASL. Both interviews cover the same topical areas that center on work, courses of study, family and background, as well as leisure activities and hobbies.

How measures are assessed ‒ the NFA rating process employs two raters working together and using the same indicators as the SLPI: ASL to establish the initial rating based on language function and the final rating based on language form, though some changes in the rating form are made regarding examples of the grammar specific to NGT. The NFA rating procedure also requires the same process of reaching unanimous decisions between raters as the SLPI: ASL. The NFA proficiency levels align with the CEFR proficiency levels that ranges from A1 for beginning to C2 for fluent, and is used in the Utrecht program. See Table 17.1 below for the NFA and SLPI: ASL proficiency levels.

Table 17.1 SLPI: ASL and NFA Proficiency Levels

SLPI: ASL Proficiency Levels	NFA Proficiency Levels (Aligned with CEFR Standards)
Superior plus	C2
Superior	C1
Advanced plus
Advanced	B2
Intermediate plus
Intermediate	B1
Survival plus
Survival	A2
Novice plus
Novice	A1
No functional skills	No functional skills

Source: adapted from Caccamise & Newell, 2007, for ASL; van den Broek-Laven, Boers-Visker, & van den Bogaerde, 2014, for NFA.

Psychometrics ‒ the NFA has the same psychometric properties as SLPI: ASL and ASLPI. It also has extensive rating procedures and requires unanimous rating decisions. As with many other formal assessments of language, evaluators need to be extensively (re)trained to ensure that their ratings maintain its initial calibration and reliability.

51U American Sign Language Phonological Fluency Task (51U)

The 51U test is a derivation of the Controlled Oral Word Association test that uses the spoken letters F, A, and S (Patterson, 2011) and modified by Morere, Witkin, & Murphy (2012) using the handshapes 5, 1, and U in ASL (Beal-Alvarez & Figueroa, 2017; Morere, Witkin, & Murphy, 2012). The handshapes are selected based on the order of frequency of use in ASL (Morere, Witkin, & Murphy, 2012; Morford & MacFarlane, 2003). The 5-handshape is a high-frequency handshape, the 1-(index) handshape is less frequently used, and the U-handshape is the least used of the three (Morere, Witkin, & Murphy, 2012). The 51U can be used as either a formative or a summative assessment test.

Target population ‒ the 51U test assesses phonological skills in ASL by L1 deaf (Morere, Witkin, & Murphy, 2012) and L2 hearing ASL learners (Beal & Faniel, 2018).

Target measure ‒ the test assesses expressive phonological fluency based on the ASL handshapes of 5, 1, and U. It can be used as a formative measure of progress over time, similar to curriculum-based measurements since it only requires 3 minutes to administer (Beal & Faniel 2018; Smith & Davis, 2014).

Format of test ‒ the subjects are given ASL handshapes for 5, 1 (index), and U, one at a time, and asked to generate as many possible ASL signs that use each handshape within a one-minute time period for each handshape.

How measures are assessed ‒ test results are rated by two fluent ASL users and expressed as the number of correct productions of signs.

Psychometrics ‒ the ratings are subjective. Morere, Witkin, & Murphy (2012) reported a significant correlation between the scores from 51U and the American Sign Language ‒ Sentence Reproduction Test (ASL-SRT), described below, among deaf subjects enrolled at Gallaudet University, which indicated the concurrent validity of 51U with ASL-SRT. Beal and Faniel (2018) reported that the number of years of sign use and instructional approaches were significantly associated with their hearing subjects’ performance. If the test is used as a formative assessment, it should be noted that its validity and reliability may be affected by the washback effect on instruction (see Smith & Davis, 2014).

Receptive L2 assessments

The following tests measure receptive skills by providing a prompt or stimulus and asking subjects to either select a multiple-choice answer or replicate the stimulus as closely as possible. These are more objective than the expressive proficiency tests reviewed above.

American Sign Language Discrimination Test (ASL-DT)

Developed by Bochner and colleagues at NTID, the ASL-DT is based on the NTID Speech Recognition Test (Bochner, Garrison, & Doherty, 2015), a measure of sensitivity to phonological contrasts. They consider this phonological contrast sensitivity skill as a basic part of language competence and reflects a learner’s comprehension and overall language proficiency.

Target population ‒ the ASL-DT is an objective test intended to measure proficiency in phonological contrasts in ASL L2 adult learners.

Target measures ‒ the ASL-DT contains a paired-comparison discrimination task to measure subjects’ ability to recognize phonological and morphophonological contrasts in ASL. Each test item consists of two pairs of ASL utterances, each of which contains a standard sentence followed by a comparison sentence that may either be the same or have one element that is different. The number of lexical signs in each pair of sentences contains a range of three to nine signs. The test stimuli feature sentence pairs with certain signs that are similar or distinct in form or meaning, and with “minimal pairs,” that is, signs that are similar or contrasted in at least one linguistic parameter such as movement, handshape, palm orientation, location, and complex morphology. An illustrative example is given below with glosses of the sign language utterances (Bochner, 2015):

Trial 1

YOUR APPOINTMENT NEED CHANGE. (standard sentence)

YOUR HABIT NEED CHANGE. (comparison sentence)

Trial 2

YOUR APPOINTMENT NEED CHANGE. (standard sentence)

YOUR APPOINTMENT NEED CHANGE. (comparison sentence)

In the above example, the signs for APPOINTMENT and HABIT in ASL are similar except for one parameter, which is movement. Test subjects view a video of native ASL users and tell if the sentences are the same or different. An item is scored as correct only if responses to both pairs of tasks is correct. This is to reduce the possibility of chance-level performance (Bochner et al., 2016).

Format of the test ‒ initially developed as a paper test, the ASL-DT is now computer-based. A session consists of 35 prompts, each with two pairs of ASL sentences that are drawn randomly out of a test bank of 350 items, and requires approximately 10 minutes to administer.

How measures are assessed ‒ the ASL-DT is an objective criterion-based assessment in five domain categories of sign language phonology which are handshape, palm-orientation, location, and movement, and complex morphology.

Psychometrics ‒ Bochner et al. (2016) found that ASL-DT was able to differentiate learners into beginning, intermediate, and advanced levels of ASL skill. The test data highly fit the Rasch model of person measurement. They also added that while ASL-DT reflected subjects’ sign language knowledge better than SLPI: ASL, the two assessments can complement each other as objective and subjective measures of subjects’ knowledge and skills in ASL.

American Sign Language-Sentence Reproduction Test (ASL-SRT)

The ASL-SRT was developed by Hauser et al. (2008). It determines subjects’ ability to reproduce sentences and measures their complexity level of sentences. The test does not require a skilled interviewer, labor-intensive training, or highly trained raters.

Target population ‒ the ASL-SRT is given to native and non-native sign language learners, including children and adults.

Target measures ‒ the test is an adaptation of the TOAL-3 Speaking Grammar Subtest of the Test of Adolescent and Adult Language – Third Edition (Hammill et al., 1994). Subjects listen and repeat back recorded sentences. The prompts consist of sentences with grammatical features that gradually increase in length and complexity.

Format of the test ‒ the ASL version presents pre-recorded videos of a native signer signing 20 test sentences. These sentences increase in length and syntactic, thematic and morphemic complexity. The lexical items that were used for the test do not show regional variation, do not vary across generations, and are not a variation of a sign system. Sentence complexity increases with more fingerspelling, numerical incorporation affixes, and polymorphemic signs, and signs with a low frequency of occurrence. The time for test administration is between 10 to 15 minutes.

How measures are assessed ‒ two native deaf ASL signers rate the participants’ responses with narrowly acceptable deviations. Exact reiterations are scored as “1” while repetitions with replacements or errors are given a score of “0.”

Psychometrics ‒ according to Hauser and colleagues (Hauser, et al., 2008), ASL-SRT has a high inter-rater reliability (R = .83) and internal consistency (α = .875). They also found that deaf native signers performed better than non-native signers (p < .001) and deaf adults performed better than deaf children (p < .05). Hauser et al. (2008) added that the ability to repeat sentences correlates with language competence, and that sentence complexity does not increase with increasing sentence length.

German Sign Language Sentence Reproduction Test (DGS-SRT)

The DGS-SRT was developed by Kubus and colleagues and is an adaptation of the ASL-SRT test (Kubus, et al. 2015).

Target population ‒ like the ASL-SRT test, the DGS-SRT is given to native and non-native sign language learners, adults and children, and deaf and hearing individuals.

Target measures ‒ just as in the ASL-SRT test, subjects view and repeat back prerecorded utterances in DGS.

Format of test ‒ participants are shown 30 DGS stimulus sentences one at a time and asked to repeat exactly what they viewed immediately after seeing them. Stimulus sentences gradually increase in complexity as the test progresses.

How measures are assessed ‒ two native deaf DGS signers rate the participants’ responses. A limited number of deviations is allowed. Exact reiterations are scored as “1” while repetitions with replacements or errors are given a score of “0.”

Psychometrics ‒ the only published source we could find on the use of DGS-SRT is in a study of adult bilingual German and DGS users by Kubus et al. (2015). They reported that the DGS-SRT reliability is high (Cronbach α = 0.959).

British Sign Language Sentence Reproduction Test (BSL-SRT)

The BSL-SRT was developed at the Deafness Cognition and Language Research Centre (DCAL) by Cormier and colleagues (2012).

Target population ‒ the BSL-SRT is given to L1 and L2 sign language adult learners, both hearing and deaf.

Target measures ‒ as in the SRTs described above, in BSL-SRT the subjects watch and repeat prerecorded utterances in BSL.

Format of test ‒ in contrast to other SRTs, in BSL-SRT the subjects observe and sign 40 BSL sentences of increasing length and complexity.

How measures are assessed ‒ the rating protocol is the same as in the other SRTs.

Psychometrics ‒ as with the ASL-SRT and DGS-SRT, Cormier, et al. (2012) found that native signers significantly performed better in repeating observed BSL sentences than non-native signers.

American Sign Language Comprehension Test (ASL-CT)

Developed by Hauser et al. (2015), the ASL-CT is a receptive ASL test. Raters are not needed for the test. Scores showing ASL proficiency levels are provided immediately upon test completion.

Target population ‒ the ASL-CT is given to L1 and L2 adult learners, both deaf and hearing.

Target measures ‒ the ASL-CT measures receptive skills in grammatical aspects of ASL such as phonology, vocabulary, role shifting, and depicting verbs constructions including classifiers.

Format of test ‒ this test is web-based and consists of 30 test items. Three practice items are presented with feedback. All test items are presented randomly. Half of the items consist of line drawings and event videos as prompts and four response choices of sign description in ASL. The subjects match sign description with the prompted drawings and event videos. The other half provides sign descriptions as prompts and four choices of line drawings or event videos. The subjects match drawings and videos with sign descriptions.

How measures are assessed ‒ the ASL-CT is a computer scored objective test with one correct choice and three incorrect foils.

Psychometrics ‒ he test has good internal reliability (α = 0.834). It has high concurrent validity with ASL-SRT (r = .715). Hauser et al. (2015) reported that deaf native signers performed significantly better than deaf non-native signers and hearing native signers, and that hearing ASL learners’ level of ASL courses they were taking correlated with their test results (r = .726).

Assessments of L2 sign language teachers and interpreters

Assessments that involve the use of sign languages are not only given to learners but also L2 sign language teachers and sign language interpreters. While the learner assessments cover the domain areas of sign languages, L2 sign language teachers and interpreter assessments include not only the domain areas, but also other domain areas that pertain to the field of knowledge and skills in these respective disciplines. Since this chapter focuses on assessment of sign language knowledge and skills, a broad discussion of teacher and interpreter assessments is limited by space to teacher and interpreter certifying entities and their assessment systems and domain areas in the following.

Assessment of L2 sign language teachers

The ever-growing numbers of L2 classrooms (e.g., for the US, see Rosen, 2008; Wilcox, 2018) necessitate that teachers who teach L2 sign languages need to demonstrate knowledge and skills in not only sign languages, but also the aspects of pedagogy including instruction strategies and materials; the development of course design as curriculum such as a course, a unit, and a lesson; learning process and acquisition; and learners’ assessment forms and protocols. In many countries they need to be qualified for teacher certification. Teacher assessments cover sign language skills, the pedagogy of language, the linguistics of sign language, and deaf people, community, culture, and history. Certification examinations typically rely on the standards for learning and teaching that are created by government education entities and professional organizations of practitioners and researchers (see Rosen, Chapter 2 in this volume). For example, in the US teacher certification examinations given by various states typically follow the language learning standards developed by ACTFL, and in the EU by the Common European Framework for Reference ‒ Languages (CEFR). These have been adapted for the evaluation of sign language proficiency respectively by ASLTA and ProSign Project, which is developed by a consortium of colleges and universities in the EU. Countries and provinces within countries vary on whether they use standardized assessments or develop their own assessments of sign language skills, as well as the requirements and assessments they provide, including test forms, domains, administration, and scoring criteria.

Assessment of sign language interpreters

There is also an increased enrollment of signing deaf learners in general education schools, colleges and universities, which creates a need for sign language interpreters. Sign language interpreters are expected to have the knowledge and skills in sign languages and interpretation, which can be demonstrated through assessment. In the countries where interpreting is recognized as a profession, individuals who want to become professional sign language interpreters need to complete a higher education degree program in interpretation, pass interpreting assessments, and obtain interpreting licensure and certifications. As previously noted, the international, national, and provincial bodies of certifying organizations, either statutory or interpreter associations, have different requirements and assessments they provide (for ASL: Liu, 2015; Cokely, 2012; National Association of the Deaf, 2016; for AUSLAN: Napier, 2004). Different countries develop their own assessment tests and procedures, such as in Australia with the National Accreditation Authority for Translators and Interpreters (NAATI), in the US with the National Interpreter Certification (NIC) and Educational Interpreter Performance Assessment (EIPA), in the United Kingdom with the Council for the Advancement of Communication with Deaf People (CACDP) Signature Level 6 NVQ (National Vocational Qualification) Certificate, and in Canada with the Canadian Evaluations System’s Certification of Interpretation (COI). While they have different requirements and assessments, the domain areas of interpreter assessments are similar. Their interpreter assessments typically include knowledge, performance, and interview domains. The knowledge domains typically include interpreting issues, theory, and models; sign language linguistics; the interpreting code of ethics and professional conduct; cultural, linguistic, and social issues within the Deaf community; and the business of interpreting. The performance examination typically assesses interpreters’ ability to mediate through sign languages and spoken languages between deaf and hearing consumers. The interpreter assessments vary in test formats. The NAATI, NIC, EIPA, and TOI are scored assessments and the Signature Level 6 NVQ is a portfolio assessment.

Future trends

Due to inconsistencies in the format, domain areas, data solicitation, and scoring in the assessments noted above that the profession should continue to seek ways to improve in measuring the language skills and knowledge of sign language learners as they develop, including the ongoing development of both formative and summative tests, and to increase efforts to standardize domain areas and scoring across L2 sign language assessments.

Future research studies

Many tests measure only a small range of linguistic structures, and are either formal or informal, and expressive or receptive. The overall evaluation of an L2/Ln individual’s skills should be comprehensive rather than relying on only one or a few assessments. We have indeed come a long way from the subjective and often arbitrary evaluations of the past. There is also the need for continuing sign language linguistic research to inform the development of new domain areas in future tests. Future research should also focus on developing comprehensive assessments that include expressive and receptive tests, formal and informal tests, objective, not subjective, evaluation protocols, and standardization in scoring, including norms. High construct validity across different tests needs to be attained by grounding on the same body of linguistic research and communication studies. In addition, future research should address the need to increase inter-test reliability across different sign language assessment tests. There are recent efforts towards more objective and standardized tests that do not rely on raters.

Future pedagogical applications

The sign language assessment tests that are informal such as the proficiency tests reviewed here are quite well entrenched as a standard method and can give us some reasonable confidence in the results. This presumes that rigorous training and oversight of the raters is maintained, and that scores are objectively determined. However, there is a high reliance on subjective ratings by evaluators assessing global levels of proficiency. The reliance on trained raters has been the core of the problem. It inhibits the distribution and availability of tests developed in the labs of sign language researchers (Hauser et al., 2015). It is critical that raters receive periodic and intensive trainings to ensure that at least the reliability and inter-rater agreement remains adequate for a fair assessment. In addition, many assessment tests that were developed are not commercially available. They were initially developed for research purposes. Those tests need to be made available for the assessment and evaluation public for use with the L2/Ln learners. As sign languages are increasingly accepted in many countries, there are countries that do not have L2/Ln assessments in their sign languages. Existing sign language assessments may need to be adapted for their sign languages.

References

American Council for the Teaching of Foreign Languages (2019). Oral Proficiency Assessments. Alexandria, VA: ACTFL Website. Accessed at /www.actfl.org/professional-development/assessments-the-actfl-testing-office/oral-proficiency-assessments-including-opi-opic

Ashton, G., Cagle, K., Kurz, K., Newell, W., Peterson, R., & Zinza, J. (2014). Standards for learning American Sign Language. Rochester NY: American Sign Language Teachers Association. Accessed at www.aslta.org/wp-content/uploads/2014/07/National_ASL_Standards.pdf

Beal, J.S., & Faniel, K. (2018). Hearing L2 sign language learners: How do they perform on ASL phonological fluency? Sign Language Studies, 19 (2), 204–24.

Beal-Alvarez, J.S., & Figueroa, D.M. (2017). Generation of signs within semantic and phonological categories: data from deaf adults and children who use American Sign Language. Journal of Deaf Studies and Deaf Education, 22 (2), 219–32.

Beal-Alvarez, J.S., & Scheetz, N.A. (2015). Preservice teacher and interpreter American Sign Language abilities: Self-evaluations and evaluations of deaf learners’ narrative renditions. American Annals of the Deaf, 160 (3), 316–33.

Bochner, J. (2015). American Sign Language Discrimination Test. Accessed at www.signlang-assessment.info/index.php/american-sign-language-discrimination-test.html

Bochner, J., Garrison, W., & Doherty, K. (2015). The NTID Speech Recognition Test: NSRT. International Journal of Audiology, 54 (7), 490–98.

Bochner, J.H., Samar, V.J., Hauser, P.C., Garrison, W.M., Searls, J.M., & Sanders, C.A. (2016). Validity of the American Sign Language Discrimination Test. Language Testing, 33 (4), 473–495.

Boers-Visker, E., Poor, G.S., & van den Bogaerde, B. (2015). The sign language proficiency interview: Description and use with sign language of the Netherlands. Proceedings of the 22nd International Congress on the Education of the Deaf (ICED 2015). Greece, July 6‒9, 2016, Athens.

Caccamise, F. & Newell, W. (2007). SLPI PAPER #4: SCPI-SLPI HISTORY. Rochester NY: National Technical Institute for the Deaf. Accessed at www.rit.edu/ntid/slpi/sites/rit.edu.ntid.slpi/files/page_file_attachments/FAQSLPIHistory.pdf

Caccamise, F.C., & Samar, V. J. (2009). Sign Language Proficiency Interview (SLPI): Prenegotiation interrater reliability and rater validity. Contemporary Issues in Communication Science & Disorders, 36, 36–47.

Cagle, K. (2009). SLPI Training in Kenya (2004 and 2009) and Ghana (2009). Rochester, NY: National Technical Institute for the Deaf. Accessed at www.rit.edu/ntid/slpi/events/slpi-training-kenya-2004-and-2009-and-ghana-2009

Chalhoub-Deville, M., & Fulcher, G. (2003). The oral proficiency interview: A research agenda. Foreign Language Annals, 36 (4), 498–506.

Cokely, D. (2012, May). Defenders of Certification: Sign Language Interpreters Question “Enhanced” RID NIC Test. Street Leverage Website. Accessed at https://streetleverage.com/2012/05/defenders-of-certification-sign-language-interpreters-question-enhanced-rid-nic-test/

Cormier, K., Adam, R., Rowley, K., Woll, B., & Atkinson, J. (2012). The British Sign Language Sentence Reproduction Test: Exploring age-of-acquisition effects in British deaf adults. Presented at: Experimental studies in sign language research: Sign language workshop at the Annual Meeting of the German Linguistics Society (DGfS). March 7, 2012. https://dcal.blob.core.windows.net/content-bsl-srt/BSLSRT_DGfS_7Mar2012.pdf

European Centre for Modern Languages (2019). Thematic Areas: Sign Languages. Accessed at www.ecml.at/Thematicareas/SignedLanguages/tabid/1632/language/en-GB/Default.aspx

Farrell, T.S. (2014). Promoting Teacher Reflection in Second Language Education: A Framework for TESOL Professionals. New York: Routledge.

Gallaudet University. (2019a). American Sign Language Proficiency Interview. Accessed at www.gallaudet.edu/the-american-sign-language-proficiency-interview/aslpi

Gallaudet University (2019b). ASLPI Research and Statistics. Accessed at www.gallaudet.edu/the-american-sign-language-proficiency-interview/aslpi/aslpi-research

Hammill, D.D., Brown, V.L., Lartsen, S.C., & Wiederholt, J.L. (1994). Test of Adolescent and Adult Language (TOAL-3). Austin, TX: Pro-Ed.

Harris, R. (2017) ASL in Academic Settings: Language Features. ASLized Website. Accessed at www.youtube.com/watch?v=VX18-4m-EN0

Hauser, P.C., Paludneviciene, R., Supalla T., & Bavelier, D. (2008). American Sign Language-sentence reproduction test: Development and implications. In R.M. de Quadros (ed.), Sign Language: Spinning and Unraveling the Past, Present and Future (pp. 160–72). Petropolis, Brazil: Editora Arara Azul.

Hauser, P.C., Paludneviciene, R., Riddle, W., Kurz, K.B., Emmorey, K., & Contreras, J. (2015). American Sign Language Comprehension Test: A tool for sign language researchers. Journal of Deaf Studies and Deaf Education, 21 (1), 64–9.

Holcomb, T.H. & Smith, D. (eds.) (2018) Deaf Eyes on Interpreting. Washington, DC: Gallaudet University Press.

Kubus, O., Villwock, A., Morford, J.P., & Rathmann, C. (2015). Word recognition in deaf readers: Cross-language activation of German Sign Language and German. Applied Psycholinguistics, 36 (4), 831–54.

Leeson, L., van den Bogaerde, B., Rathmann, C., & Haug, T. (2016). Sign Languages and the Common European Framework of Reference for Languages. www.ecml.at/Portals/1/mtp4/pro-sign/documents/Common-Reference-Level-Descriptors-EN.pdf

Leeson, L., Haug, T., Rathmann, C., Sheneman, N., & Van den Bogaerde, B. (2018). Survey Report from the ECML Project ProSign: Sign Languages for Professional Purposes. Accessed at: www.hfh.ch/fileadmin/files/documents/Zentrale_Dienste_Personal/who_is_who/hat_prosigns1_survey_report_2018_06_29_nb.pdf

Liskin-Gasparro, J.E. (2003). The ACTFL proficiency guidelines and the oral proficiency interview: A brief history and analysis of their survival. Foreign Language Annals, 36 (4), 483–90.

Liu, M. (2015). Assessment. In Pöchhacker, F. (ed.), Routledge Encyclopedia of Interpreting Studies (pp. 20–2). New York: Routledge.

Mayberry, R.I. (1993). First-language acquisition after childhood differs from second-language acquisition: The case of American Sign Language. Journal of Speech, Language, and Hearing Research, 36 (6), 1258–70.

Mertler, C.A. (1999). Teachers’ (Mis)conceptions of Classroom Test Validity and Reliability. Paper presented at the Annual Meeting of the Mid-Western Educational Research Association, Chicago, IL, October 13–16, 1999.

Morere, D.A., Witkin, G., & Murphy, L. (2012). Measures of expressive language. In D.A. Morere & T. Allen (eds.), Assessing Literacy in Deaf Individuals: Neurocognitive Measurement and Predictors (pp. 141–57). New York: Springer.

Morford, J.P., & MacFarlane, J. (2003). Frequency characteristics of American Sign Language. Sign Language Studies, 3 (2), 213‒25.

National Association of the Deaf. (2016). President Report about NAD-RID Transcript. Accessed at www.nad.org/about-us/board/president-report-about-nad-rid-transcript/

Napier, J. (2004). Sign language interpreter training, testing, and accreditation: An international comparison. American Annals of the Deaf, 149 (4), 350–59.

Napier, J., & Leeson, L. (2016). Sign Language in Action. Palgrave Macmillan, London.

Norris, J. & Ortega, L. (2012). Assessing learner knowledge. In S. Gass, and A. Mackey (eds.), The Routledge Handbook of Second Language Acquisition (pp. 591–607). New York: Routledge.

Patterson J. (2011) Controlled Oral Word Association Test. In J.S. Kreutzer, J. DeLuca, & B. Caplan (eds.), Encyclopedia of Clinical Neuropsychology. New York: Springer.

Poor, G. (2011). SLPI in Holland. Rochester NY: National Technical Institute for the Deaf. Accessed at www.rit.edu/ntid/slpi/content/slpi-holland

Rosen, R.S. (2008). American Sign Language as a foreign language in US high schools: State of the art. The Modern Language Journal, 92 (1), 10–38.

Rosen, R.S. (2015). Learning American Sign Language in High School: Motivation, Strategies, and Achievement. Washington, DC: Gallaudet University Press.

Smith, D., & Davis, J. (2014) Formative assessment for learner progress and program improvement in sign languages L2 programs. In D. McKee, R. Rosen, & R. McKee (eds.), Teaching and Learning of Sign Languages: International Perspectives and Practices. Basingstoke, Hampshire, UK: Palgrave Macmillan.

Van den Broek-Laven, A., Boers-Visker, E., & van den Bogaerde, B. (2014). Determining aspects of text difficulty for the Sign Language of the Netherlands (NGT) Functional Assessment instrument. Papers in Language Testing and Assessment, 3 (2), 53–75.

Warshaw, J.S. (2013). Using Organizational Change Theory to Understand the Establishment of Kindergarten through Twelfth-grade American Sign Language Content Standards in Schools for the Deaf: A Study of the Early Innovators. Dissertation. University of Redlands, Redlands, CA.

Wilcox, S. (2018). Universities that Accept ASL in Fulfillment of Foreign Language Requirements. Accessed at www.unm.edu/~wilcox/UNM/univlist.html

World Association of Sign Language Interpreters (2015). Country Reports. Susan Ehrlich (ed.). Accessed at http://wasli.org/wp-content/uploads/2012/11/WASLI-Country-Reports-2015.pdf