Designing for learner engagement with computer-based testing

The issues influencing student engagement with high-stakes computer-based exams were investigated, drawing on feedback from two cohorts of international MA Education students encountering this assessment method for the first time. Qualitative data from surveys and focus groups on the students’ examination experience were analysed, leading to the identification of engagement issues in the delivery of high-stakes computer-based assessments.The exam combined shortanswer open-response questions with multiple-choice-style items to assess knowledge and understanding of research methods. The findings suggest that engagement with computer-based testing depends, to a lesser extent, on students’ general levels of digital literacy and, to a greater extent, on their information technology (IT) proficiency for assessment and their ability to adapt their test-taking strategies, including organisational and cognitive strategies, to the online assessment environment. The socialisation and preparation of students for computer-based testing therefore emerge as key responsibilities for instructors to address, with students requesting increased opportunities for practice and training to develop the IT skills and test-taking strategies necessary to succeed in computer-based examinations. These findings and their implications in terms of instructional responsibilities form the basis of a proposal for a framework for Learner Engagement with e-Assessment Practices.


Introduction
High-stakes computer-based testing is growing in importance within UK higher education and has developed from computer-based formative assessment practices which are already well established across the sector (Jenkins, Walker, and Voce 2014;JISC 2007).The most recent universities and colleges information systems association (UCISA) Technology Enhanced Learning Survey (Walker et al. 2016) reported that 10 out of the 110 institutions surveyed have implemented defined response (multiple choice) summative tests for over 50% of their courses.
We observe that whilst there has been strong institutional uptake of high-stakes e-assessment, there has been little corresponding research conducted on students' attitudes towards and experiences of computer-based testing.A review of the literature reveals that most research to date has focused on investigating the differential impact of computer-based versus pen-and-paper assessments on student achievement (Kingston 2008;Leeson 2006;Mead and Drasgow 1993) and the relationship between individual differences including gender, race, socio-economic status, digital literacy, computer anxiety and performance on computer-based assessments (Leeson 2006).Only a smaller number of studies have investigated students' attitudes towards computerbased assessment (e.g.Dermo 2009;Deutsch et al. 2012;Hillier 2014;Walker, Topping, and Rodrigues 2008), and these studies have tended to focus on undergraduate students and report on a single experience of computer-based testing, with few studies exploring engagement over a series of assessments Á Deutsch et al. (2012) is a notable exception.
Research studies comparing performance on computer-based and pen-and-paper examinations have found that differences are generally small and not of practical significance Á in particular, for objective and multiple-choice assessments (Kingston 2008;Leeson 2006;Mead and Drasgow 1993).Results of studies investigating the relationship between learner variables and performance on computer-based assessments are mixed, and where effects of learner variables are observed, these tend to be small (Leeson 2006).Whilst these studies are important and serve to allay concerns about equity that have been raised in explorations of students' attitudes towards and experiences of computer-based testing (e.g.Hillier 2014), the few post-examination surveys and focus groups that have been conducted have identified a wider range of concerns which merit further exploration.Concerns frequently cited in these studies include computer anxiety and perceptions of computer self-efficacy, concerns relating to test security and the potential for cheating, and technical difficulties (Deutsch et al. 2012;Hillier 2014;Ozden, Erturk, and Sanli 2004).
This paper attempts to fill the gap in the literature on the postgraduate student experience with computer-based assessment.Postgraduate students are important subjects for investigation because they present unique challenges with respect to the introduction of computer-based assessment; they are often mature students who may have less experience of using technology in education, and the short duration of MA programmes can limit the time available to socialise and prepare them for different examination formats and procedures.
Our study investigates student engagement issues for a computer-based examination of research methods, drawing on research from two cohorts of international MA students who were encountering this assessment method for the first time in their programme.It combines pre-and post-examination surveys and follow-up focus group interviews to provide insights into students' concerns and the steps tutors ought to take to prepare students for computer-based examinations.These insights have informed the development of a framework for Learner Engagement with e-Assessment Practices (LEe-AP), which is presented in this paper.

Context and exam design
In 2010, the University of York introduced the requirement that all taught content on a study programme should be assessed.Consequently, we were now required to assess our master's level research methods module.By the time this was introduced, our research methods module was taken by more than 150 international students, mainly Chinese students, and the number of students on the programmes was increasing year by year.Given the size of the cohort and the need to complete marking and delivery of feedback to students within 6 weeks, the decision was taken to introduce a predominantly multiple-choice-style examination.
In academic years 2011Á2012 and 2012Á2013, the MA Research Methods module was therefore assessed through a 2-hour pen-and-paper examination, comprising two-thirds multiple-choice-style questions and one-third longer answer questions.Multiple-choice questions were used to assess knowledge and understanding of a wide range of methods for carrying out research.Longer answer questions were employed to assess students' ability to critically examine research reports in terms of the suitability of the methods used and the implications for the substantive claims that are made (see the Assessment Design section).
The module team, however, still found that marking the paper-based multiplechoice examination in the required 6-week period was challenging when busy teaching other modules.The decision was therefore taken to switch to an e-assessment format in 2013Á2014, which would permit the automation of the marking of the multiplechoice questions and consequently reduce the turn-around time in the delivery of marks and feedback to students.The 161 students who sat the module in 2013Á2014 completed a 1-hour formative exam in their first term (November 2013) and a 2-hour summative exam at the beginning of the second term (January 2014).Multiple-choice items accounted for the main component (70%) of the summative examination, with open questions employed to assess critical thinking and higher order skills.The former were worth between 1 and 6 marks depending on the number of parts and difficulty; the latter required students to write around five sentences and were worth 10 marks each.They were presented in a randomised order within a locked down and secure version of the institutional Blackboard Learn Virtual Learning Environment (VLE).
The 191 students of the 2014Á2015 cohort followed a similar procedure.However, in response to the previous cohort's concerns over typing proficiency and noise levels in the completion of open-response questions, adjustments were made to the summative exam design, with a reduction in the number of both open-and multiple-choice-style question items Á as discussed in the Assessment Design section, adjustments were also informed by difficulty and discrimination analyses.As in 2013Á2014, the order of presentation of the questions was randomised in both the formative and summative examinations because this was essential to maintaining the integrity of the examination.Privacy screens were not available in all the personal computer (PC) rooms which were being used as test centres, and in this context there remained the possibility that students could look at each other's screens Á hence the randomised design.In response to feedback on the organisation of the 2013Á2014 summative exam, questions were however grouped by weighting; items worth the highest number of marks were presented first, and randomisation was applied within those groups (as opposed to across the examination as a whole).

Research methods
We recognise students as key stakeholders in the assessment process, and feedback was actively sought from each cohort and used to guide improvements to the examination as part of our participant-informed design approach along with analyses of pitch of the examination (see the Assessment Design section).Feedback on students' assessment experience was solicited through questionnaires and focus groups.In 2013Á2014, questionnaire and focus group data were collected after the summative examination.In 2014Á2015, questionnaire data were collected after the formative examination and again after the summative examination, with focus group interviews also conducted after the summative examination.The questionnaires and focus group instruments were informed by Dermo's (2009) Student Perceptions of e-Assessment Questionnaire (SPEAQ) and a review of previous instruments and research on students' attitudes towards and experiences of computer-based testing (Deutsch et al. 2012;Ferrao 2010;Frein 2011;Hillier 2014;Williams and Wong 2009).Students' prior exposure to computer-based testing and their reflections on their preparation for this exam were explored.Students were also invited to reflect on their experiences of taking the exam, comment on the suitability of the assessment method and make recommendations for how it could be improved in the future.
The study was approved by the departmental ethics committee, and information sheets and consent forms were distributed to all participants.The 2013Á2014 survey was completed by 48 of the 161 students who sat the end of module examination.The 2014Á2015 pre-and post-examination surveys were completed by 42 of the 191 students who sat the module.Most of these participants were aged 20Á24, and, reflecting the demographics of the cohort as a whole, the majority of respondents were Chinese females (see Table 1).This paper focuses on the data collected from the 36 Chinese females who completed the survey in 2013Á2014 and the 28 Chinese females who completed it in 2014Á2015.
A total of 18 Chinese female students from the 2014Á2015 cohort volunteered to participate in three separate focus group interviews in January 2015, with five students from the 2013Á2014 cohort participating in February 2014.Transcripts from the focus groups were generated, and a qualitative content analysis was performed (Hsieh and Shannon 2005).The unit of analysis was a line in the transcript, which, in turn, could address multiple units of meaning.Comments were categorised and then mapped against an evaluative framework based on Dermo's ( 2009) key themes, addressing affective variables, validity, practicality, reliability, security and pedagogical issues related to the assessment method.The categorisation and mapping processes were then repeated for the open comments from the surveys, and the outputs from these analyses were then compared to form a rich picture of student experiences with computer-based testing, from which common themes in the student assessment experience were derived.

Findings and discussion
The findings of the questionnaires and interviews highlight a range of issues, which appear to have influenced our students' reception of computer-based testing.These issues are discussed below along with recommended actions for instructors to consider when introducing computer-based testing.Together they form a proposal for a framework for LEe-AP, (see Table 2).

Socialisation of learners
Feedback indicated a need for the instructor to articulate the rationale and the suitability of the assessment methods to the discipline being assessed, with some students unconvinced by the need for computer-based testing methods.
I think this exam could also be paper based because in research methods we always use computers to analyse some data, always computer but in this exam only a few questions is about how to analyse data.The others could also be shown on paper, so for me, I think, it's not really necessary to be done with computer.(Post-Test Focus Group 2014Á 2015) This resonates with findings from Hillier's cross-disciplinary review of undergraduate students' attitudes towards e-assessment and the perceived 'readiness' of disciplines to match a computerised assessment approach (Hillier 2014).When introducing e-assessment in disciplines and module areas where there is a less obvious fit with approach to assessment, it will be necessary to provide students with greater support in managing their anxiety levels.
The rationale for e-assessment should also address the benefits of this approach.Our findings confirm previous research (e.g.Noubandegani 2012) in highlighting the perceived presentational benefits of typed open responses as opposed to poorly handwritten answers and the favourable impression that students believe this will have on markers: If we are bad at writing, yes especially in China, we lose marks because of that.(Post-Test Focus Group 2014Á2015) Notwithstanding these benefits, some students felt that there was a generational bias in favour of more technically literate students (i.e.those who had proceeded directly to graduate study and not spent time working) and an equity issue affecting mature students returning to education, who might take longer to adjust to computer-based exams: Some of our classmates after they have had some experience . . .they return to school to get more experience in teaching.It could be some difficulty for them to use a computer in typing when they attend the examination, so it could take them longer time to get used to the system, so I think it could unfair for them.(Post-Test Focus Group 2014Á2015) This reflects a commonplace perspective on the existence of Net Generation (Tapscott 2008) or Digital Natives (Palfrey and Gasser 2008;Prensky 2001a, b) with distinctive skills and aptitudes for digital learning, which still persists and perhaps is more keenly felt by mature and returning students to full-time education, despite recent studies in the UK rejecting these claims (Jones et al. 2010;Margaryan, Littlejohn, and Vojt 2011).Whilst research on computer-based testing (e.g.Frein 2011;Leeson 2006) has found no difference in test results between highly experienced and less experienced users of computers, our findings align with previous studies of student perceptions of e-assessment (Fluck 2013;Hillier 2014) in underlining the need to address students' concerns about fairness in the orientation phase and to offer reassurance from the outset.The orientation of learners to a new assessment method Á particularly when this method departs from established assessment norms within a study programme Á emerges as a key responsibility for instructors to address.This is consistent with previous studies (e.g.Deutsch et al. 2012;Zakrzewski and Steven 2003) which have identified the need for instructors to integrate the assessment method at an early stage into the curriculum and in this way address and reduce student anxiety levels, which may stem from a variety of factors touching on computer self-efficacy, concerns over security and cheating, to fears over computers crashing during an online examination (Mogey and Sarab 2006).

Preparation of students for assessment
Computer aversion or anxiety as a barrier to the adoption of computer-based assessment is a well-researched concept (e.g.Durndell and Lightbody 1993;Meier 1985Meier , 1988)).Meier defines this as a negative affective and cognitive state which often occurs when individuals have low expectations about the rewards of using computers or low confidence in their ability to use computers effectively.Adequate preparation appears key to counteracting this mindset, but there is less agreement on the nature of the interventions that are required to address student anxiety levels.Student feedback in our study highlighted two areas where preparation may be needed: digital skills and examination technique.Zakrzewski and Steven (2003) have highlighted the enhancement of students' information technology (IT) skills as a prerequisite for student preparation, but it is far from clear what this actually entails.Leeson's (2006) review suggests that there is no established relationship between examinees' level of computer familiarity and performance on computer-based tests.Similarly, we found no association between any of the indicators of digital literacy that we surveyed, which included previous experience of computer-based testing and keyboarding skills and preferences for computer-based versus paper-based assessment.Indeed, whilst keyboarding proficiency was highlighted by some individuals as a differentiating factor in exam performance, more commonly concerns were expressed over the use of universityprovided hardware and software in the exam: I'm not used to using the keyboard because it's different from laptop keyboard.(Post-Test Focus Group 2014Á2015) I find that when I type really faster and quickly, there are typing errors I won't recognise because there are no red lines on the line of words that I typed wrong.(Post-Test Focus Group 2014Á2015) These findings suggest that we need to draw a distinction between digital proficiency Á reflected in the effective day-to-day use of technology for learning (e.g. from email to essay writing) Á and IT proficiency for assessment, reflected in the capability to use unfamiliar technology under time pressure in computer-based exams.In other words, our research suggests that we need to go beyond equipping students with basic IT skills and familiarising them with the assessment environment and provide them with opportunities to develop proficiency in typing under time pressure, including use of unfamiliar hardware and software under authentic examination conditions.This might be achieved through the creation of computer-based formative assessments which align with the format of the summative examination.

Exam technique: test-taking strategies for online examinations
Another important determinant of performance in examinations which students identified was effective test-taking strategies.Like the students in Hong, Sas, and Sas' (2006) investigation of the test-taking strategies of high school students in a paperbased mathematics examination and Walker, Topping, and Rodrigues' (2008) investigation of first-and second-year university students' expectations and perceptions of a computer-based science examination, the students in our study attempted to deploy a range of organisational (e.g.time management and sequencing) and cognitive (e.g.checking, eliminating and using memory aids) strategies.Transferring strategies developed for paper-based examinations to the computer-based examination was, however, not always straightforward for our students.For example, difficulties with time management appeared to be associated with the fact that questions were presented one at a time and students were not provided with an overview of the exam content: Suggestions provided by the students to address this and other issues raised here are discussed in the next section which focuses on assessment design.
The possibility to annotate the examination and externalise their knowledge of topics were among the most frequently noted cognitive strategies that students expected to be able to apply in the online environment: I do not like not having the ability to circle questions I am unsure about or make notes to myself about which questions to come back to.During written assessments, I often write all over my test questions with arrows, circles, and other brainstorming sketches and it is difficult to work through the online assessment without these techniques.(2014Á2015 Pre-Test Questionnaire) Similarly, some students felt that the lack of annotation facilities made it difficult for them to check their work: Easy-go-back and check approach.Don't just provide question numbers and the function of flagging as in some online exams.Provide key terms of the question for later reminding or shown only in checking process.Students could even have a chance to make notes for reminding.(2014Á2015 Pre-Test Questionnaire) As the students suggest, some of the above concerns about the appropriation of test-taking strategies might be addressed through modifications to the design of the test and the platform on which it is delivered.However, it is also clear from the feedback that one assessment design will not fit all students' test-taking strategies.As recommended by Hillier (2014) and Zakrzewski and Steven (2003), the best solution may be to provide students with opportunities to familiarise themselves with the computer-based assessment environment and adjust the test-taking strategies

Research in Learning Technology
Citation: Research in Learning Technology 2016, 24: 30083 -http://dx.doi.org/10.3402/rlt.v24.30083 that they have developed in paper-based contexts to suit the computer-based environment: More practice to help the students be familiar with the system as well as question type.(2014Á2015 Pre-Test Questionnaire)

Assessment design
Our research Á like much before it Á suggests that appropriately pitching a multiplechoice examination requires careful planning and iterative cycles of question review (Crisp and Palmer 2007;Haladyna 1999).Analyses of students' performance on the 2013Á2014 examination suggested that the balance of open-response to multiple-choice questions was not appropriate.Despite careful review of the 2012Á2013 paper version of the examination including content and cognitive behaviour reviews and subsequent revisions, students' marks on the 2013Á2014 were skewed towards the lower end of the university scale.The automatic difficulty and discrimination analyses generated by Blackboard Learn's assessment engine suggested that students found the open-response questions particularly difficult.The balance between open-response and multiplechoice questions was therefore adjusted for 2014Á2015.Careful planning and reviewing of questions before administration and annual reviews including difficulty and discrimination analyses are therefore recommended until an appropriate and stable pitch is achieved.
Randomisation of the order of the questions was also necessary to prevent cheating because there was only one PC lab at the university equipped with privacy screens.Whilst some students were positive about this aspect of the design of the assessment, others expressed concerns about the equity of randomisation: Random questions for each student don't represent the level of difficulties, for some students could encounter long answer question at Q1 which gives little confidence of students to move on.More it could also waste time in trying to answer that question and therefore time is not enough.(2013Á2014 Post-Test Questionnaire) Furthermore, randomisation caused distraction and appeared to raise some students' levels of anxiety Á the noise associated with typing meant that students were aware of the progress their neighbours were making with the questions: Maybe the first three questions should all be typing, maybe, so all the people are typing maybe, so they were not just distracted.(Post-Test Focus Group 2014Á2015) . . .I will tidy so much words but other students they finish one by one and very very quickly because the person needs a multiple choice item and I was 'oh my god' maybe they have written number 7 or number question, but I still do the first.(2014Á2015 Post-Test Focus Group) Ideally, all PC labs used for examination purposes should be equipped with privacy screens between work stations.This, however, may not be feasible in 'greenfield' e-assessment sites like our own where dedicated computer-based testing venues have not been established on campus.Where this is not feasible and randomisation of questions is necessary to prevent cheating, consistency in the presentation of questions to students is essential to promote fairness, avoid distractions and avoid increasing students' levels of anxiety.Furthermore, consideration of possible test-taking strategies ought to be taken into account.It is, however, acknowledged that students may have individual preferences with respect to test-taking strategies and the presentation of the examination.For example: It's no sense to put an open question for ten points at the beginning, so because our brain doesn't work at the beginning to write/type so much.(Post-Test Focus Group 2014Á2015) It is unlikely to be possible to design the examination to accommodate all students' preferences and therefore essential that the assessment platform is flexible and students have ample opportunities for 'practice' to develop dedicated test-taking strategies.Such opportunities for 'practice' might be provided through formative assessments.

Design of assessment interface
Our research also suggests that the design of the assessment interface requires careful consideration in order to accommodate students' varying test-taking strategies.As discussed previously, students reported using a range of organisational and cognitive strategies.In relation to organisational strategies, in order to help students get an overview of the examination and allocate time accordingly, examinees wanted to be able to see all the questions on one screen first Á a protocol which has been reported on in previous studies (e.g.Frein 2011): Maybe one page with all the questions on the first page and then one by one.(Post-Test Focus Group 2014Á2015) As exemplified above, there was no consensus on item presentation thereafter.Leeson's (2006) summary of the research literature, however, suggests that multiple items on screen may have a facilitating effect in allowing examinees to skip, scan and build off previous item information and may counter the effects of single-item presentation on screen, which may encourage hurried responses and increase errors.
Easy navigation of the examination is also essential, given that it is unlikely to be possible to accommodate students' differing preferences for the sequencing of questions.To support navigation, students made a number of suggestions including tagging questions with keywords and grouping them into categories: Easy-go-back and check approach.Don't just provide question numbers and the function of flagging as in some online exams.Provide key tests of the question for later reminding or shown only in checking process.Students could even have the chance to make notes for reminding.(2014Á2015 Pre-Test Questionnaire) Yes, also, I don't think it's necessary to just complete one question one time because maybe two or three that's OK, because one and you have to click, click, click.How about one hundred questions . . .categorise the options, which one appears here to go back and which one to out of that, and which to submit.(2014Á2015 Post-Test Focus Group) ensure all technical issues have been resolved before students enter assessment centres: It is better not to let the students enter into the IT room until confirmed all the computers work as well.(2014Á2015Post-Test Questionnaire) Students also highlighted the importance of ensuring equity in conditions across assessment centres.Noise levels were a particular concern as in previous research (Fluck 2013;Hillier 2014), and our data suggest that the impact on noise levels of the size of assessment centres as well as the quality of keyboards ought to be considered: There may be too many people at one computer room which leads the noise of typing (2014Á2015 Post-Test Questionnaire) keyboard in [named] college is much quieter than in library which minimises the noise during the exam.(2014Á2015 Post-Test Questionnaire)

Conclusions
In drawing conclusions from this study, we should acknowledge the limitations of the research evidence that has been generated.The cohorts under review predominantly comprised female Chinese students with limited prior exposure to e-assessment practices in UK higher education.Moreover, they comprised international master's students who had only been at the university for 3 months prior to the examination.As a result, the opportunities for tutors to normalise online assessment procedures prior to the delivery of the research methods module were considerably limited.We acknowledge the distinctive features of this study, which may impact the generalisability of the findings and recognise that the assessment context for other institutions and cohorts will differ greatly.Notably, the findings are likely to differ from those for undergraduate students on multi-year programmes of study for whom there are likely to be greater opportunities to embed and normalise e-assessment practices.
Notwithstanding these limitations, the study offers an insight into the factors influencing postgraduate students' reception of computer-based testing when encountering these methods for the first time.Building on the existing literature (Dermo 2009;Hillier 2014), our research suggests that the issues influencing students' acceptance of e-assessment methods can be grouped into three key categories focusing on: (1) socialisation of learners to the assessment method, (2) preparation of students for the computer-based assessment and (3) assessment design and infrastructure.
Of the factors identified, we observed that the socialisation of learners is a key step in the process of learner engagement and that it requires an upfront investment of time to explain to students the rationale and value of computer-based assessment methods, particularly when it involves a departure from established assessment practices.This appears to be particularly important for mature students and those returning to education who may not have been previously exposed to e-assessment methods and may share heightened concerns over the equity and reliability of computer-based testing.Furthermore, we observed that students will need adequate preparation to negotiate the transition from formative to summative computer-based assessments.Specifically, our findings suggest that the provision of opportunities for students to familiarise themselves with the exam format under authentic conditions (i.e.timed conditions within the test environment) is as important as orientation to the question types that students will encounter in the exam and outweighs issues of digital literacy and keyboarding skills.Attention should instead be directed to fostering dedicated IT skills for assessment, and in particular 'online exam craft'.This will require students to adapt rather than directly transfer paper-based exam techniques to the online context, addressing both organisational and cognitive strategies.
We can't see the exam as a whole.Reading the exam as a whole by clicking may waste time.(2014Á2015, Pre-Test Questionnaire) In terms of time management, I think we, when we are doing the handwriting exam, I know what questions I have, but in e-exam I just didn't know what I am currently facing and I don't know what kind of questions, you know closed or open question is coming next.(2014Á2015, Post-Test Focus Group)

Table 1 .
Demographics of the cohort and sample.