Routledge International Handbook of Medical Education

With proper attention to key principles it is possible to create accountable and robust written and online assessment procedures.

This chapter is concerned with the use of objective written tests in medical education assessment. By ‘objective’ we mean tests with unambiguous answers that can be dichotomously marked as either correct or incorrect. Objective written tests are predominantly oriented towards the knowledge domain and are mainly of multi-format design using multiple-choice (single best answer), multiple-response (multiple answers), extended matching, fill in the blanks (cloze), drag and drop, script concordance and hotspot image questions. More recently, objective written tests have been created and marked by computers and for the purposes of this chapter they exclude essays or short written answers which cannot yet be marked in this way.

The importance of written objective tests lies in their ubiquitous global use to test knowledge and their increasingly strong association with computer-based assessment systems. There is a common misconception that objective tests can only assess simple recall and understanding, whereas in principle most types of knowledge in Bloom’s taxonomy, from recall through application to problem solving, can be assessed by appropriately constructed questions. Nevertheless, the knowledge that can be tested always needs to be mapped (‘blueprinted’) against the learning outcomes of the course at an appropriate level. Online assessment systems should readily fit into the exam cycle, as shown in Figure 18.1.

The EMI format is really an extension of the multiple-choice format in which selected items from one list are matched to items in another list. The usual format is that there is a short list of say two to four clinical scenarios or patient descriptions which must be matched to appropriate items in a longer list of, for example, seven to 12 diagnoses, drugs, organisms, investigations or other clinical entities (Figures 18.3 and 18.4). The format can be extended by the use of images containing items for identification. The advantage of this format is that assessors can test the ability of individuals to differentiate between closely related concepts which potentially identify deeper levels of understanding. Both MCQs and EMIs can be associated with problem-solving or data interpretation stems so that application and problem solving can be tested.

These are related systems that involve the student entering single words, phrases or numbers into a section of text or a designated text/numerical box. ‘Cloze’ is the technical term for inserting deleted words into a section of text in order to complete it correctly, and hence for assessing recall of factual information (Taylor 1953). Single words, phrases or numbers can be inserted into designated boxes as answers to a variety of question types (Figure 18.6). The effectiveness of solutions to the problems of error trapping the input and recognising correct answers from all possible inputs is a limiting factor in the use of this question format.

Labelling questions, like image hotspots, are ideally suited to assessing visual knowledge, and differ in the cues they provide. With a labelling question a number of ‘place holders’, the empty rectangles (Figure 18.9), are pre-displayed over the image of interest. The examinee must drag labels from the left and drop them into the relevant place holders. Sometimes a larger number of labels than place holders, acting as distractors, are used to make the question more difficult.

The construction of objective written questions requires some skill. The fundamental issue is to ensure that the question is valid and unambiguous and does not contain information that will allow an individual to identify any element of it as either correct or incorrect without using the knowledge constructs that the question is aimed at. Distractors in particular have to be plausibly incorrect and homogeneous with the correct answer, otherwise they can easily be eliminated without the student necessarily knowing the correct answer. There need to be sufficient distractors so that the question cannot easily be answered by chance alone. In the case of single best answers this probability should not usually be greater than 1 in 5 or 20 per cent. With multiple-response questions more distractors are required to maintain this level of probability. (The creation of plausible distractors is often the most difficult aspect of good item writing.)

Grammatical issues in sentence construction can also allow the ‘test-wise’ candidate to identify implausible distractors. Anything that provides inappropriate information will reduce the reliability of the question and the test; it will generate ‘noise’. A useful reference work for writing good-quality items is Case and Swanson (2002).

There are a number of key criteria that can be used to characterise assessments, including validity, reliability and feasibility, and these can easily be satisfied by objective written tests.

In general, assessment validity is concerned with whether an assessment measures what it is designed to measure and can be subdivided into a variety of different types (Dent and Harden 2013):

• Content validity: does the test measure and sample relevant learning objectives or outcomes?

• Construct validity: does the test measure an underlying cognitive trait, e.g. intelligence?

• Concurrent validity: does the test correlate with the results of an established test?

• Predictive validity: does the test predict future performance?

• Face validity: does it seem like a fair test to the candidates?

Online systems mark items immediately, although it is wise to allow some time for moderation and checking before releasing marks to students. Such systems also allow students to be given rapid feedback on their answers (Figure 18.12). When tests are used formatively, students can revisit the online paper and see which questions they answered correctly and which they did not. Feedback to answers can be attached to items, leading to an ‘assessment for learning’ environment. In the case of high-stakes summative examinations, when items need to be reused, learning objective metadata for each item can be used to generate a more generic form of feedback that does not allow items to be displayed again.

Online objective assessment systems can be configured to process data using a variety of psychometric methods. As well as mean marks, item difficulty and discrimination can be calculated, in addition to Cronbach’s alpha statistic (Tavakol and Dennick 2011b).

The following case study from India highlights some of the key issues associated with implementing computer-based testing (CBT) and discusses the process undertaken by the National Board of Examiners (NBE) in India.

Case study 18.1 Computer-based testing – a paradigm shift in student assessment in India

Bipin Batra

Billions of examinations and assessments are administered every year across the globe. In recent times, CBT has drawn the attention of assessment institutions as a new approach to deliver tests and assess performance of candidates or rank them on their abilities.

Conventionally, medical entrance examinations have been conducted as paper-based testing (PBT), in which a booklet containing a predetermined number of questions is provided to the candidates. On testing day candidates have to mark their responses on the optical mark reader (OMR) sheets. PBT, though simple in approach and implementation, is plagued with a multitude of problems such as the possibility of leakage of confidential material/information, the unfair use of electronic gadgets, impersonations and cheating by the examinees and logistics-related issues in transporting the question paper.

The NBE, India, is an organisation in the field of medical education that conducts various types of student assessment. Concerned with relative weakness and threats to the PBT, NBE introduced computer-based testing for entrance examinations. CBT is an IT-driven process which requires computer labs equipped with servers, secure wide-area network connectivity, firewalls, trained human resources and appropriate software and hardware. A CBT significantly enhances the scope of items to use in the test and enhances the test blueprint. The NBE conducted one of the largest tests using CBT, the National Eligibility cum Entrance Test, with 90,377 examinees in December 2012 and the steps undertaken are outlined below.

Prior to test

• The plan was tested.

• The details of estimated candidates and resources required, especially test centres, IT labs, questions/item bank and faculty support required, were mapped.

• Engagement of all stakeholders towards the impending change from PBT to CBT was undertaken by social media, internet discussions and direct communications.

• The test blueprint was prepared, the size of item bank required was estimated and the requisite numbers of items for use in the test were generated and transferred to the item banking software at specially convened item-writing workshops.

• Candidate registration and test centre scheduling were performed through web-based application.

• Computer labs with predefined technical requirements and hardware specifications were arranged at required locations with appropriate seating capacity.

• Pilot administration was undertaken 2 days before the actual test.

Testing phase

• Test administration was undertaken, with test forms released on the wide-area network immediately before onset of the test.

• Examinee feedback was undertaken on a structured questionnaire.

• After completion of the test, the responses of examinees were uploaded to the server.

Post-test phase

• Items analysis was performed through computation of difficulty and discriminatory index on two parameter models.

• Post-test form validation workshop was undertaken to review the difficulty and discriminatory indices.

• Item response theory and generation of equated score were applied to ensure comparability of different test forms used.

• Equated score was scaled using linear transformation.

• Results were published.

The NBE deployed the latest IT infrastructure to capture examinee biometrics and video record the testing phase. For 58 per cent of examinees this was their first exposure to CBT: 95 per cent of examinees felt the CBT of December 2012 met their expectations.

The conversion from PBT to CBT involved stakeholder acceptance and it was important to engage faculty, students and academic leadership at institutions and universities in the process at all stages. Medical teachers were appropriately sensitised towards use of psychometric tools and underlying principles of assessment. A sound test blueprint and adherence to principles of assessment supported with psychometric tools, ensuring stakeholder confidence and meticulous planning, were the keys to success.