Implementation of Computer Assisted Assessment: Lessons from the Literature

This paper draws attention to literature surrounding the subject of computer-assisted assessment (CAA). A brief overview of traditional methods of assessment is presented, highlighting areas of concern in existing techniques. CAA is then defined, and instances of its introduction in various educational spheres are identified, with the main focus of the paper concerning the implementation of CAA. Through referenced articles, evidence is offered to inform practitioners, and direct further research into CAA from a technological and pedagogical perspective. This includes issues relating to interoperability of questions, security, test construction and testing higher cognitive skills. The paper concludes by suggesting that an institutional strategy for CAA coupled with staff development in test construction for a CAA environment can increase the chances of successful implementation.


Introduction
This paper presents evidence that the more traditional methods of assessment within universities have their limitations.As a result of these limitations and also the continued increase in the use of technology to deliver curriculum, the gap between assessment methods and learning is widening.
Students entering higher education directly from schools and colleges are likely to have been exposed to Information Technology as part of the UK National Curriculum.Pilot studies conducted within schools for the delivery of summative assessment via the web (Ashton et al., 2003;Nugent, 2003) and for basic key skills tests in both Learn Direct and army centres (Sealey et al., 2003) indicate that CAA can successfully assess students and provide timely feedback regarding class and individual progress.There is also empirical evidence to suggest students find CAA an acceptable assessment technique (Sambell et al., 1999;Croft et al., 2001;Ricketts & Wilks, 2002a).Therefore it could be argued that for many students CAA may become a subject domains.It has been suggested that the scientific subjects produce more First Class Degrees than the humanities because of the nature of the marking criteria in using the full range of marks and subjectivity is eliminated from the equation where there is a predefined correct answer (Yorke et al., 2002;Horney, 2003).These findings would appear to be further corroborated by the Higher education Statistics Agency (HESA) figures.Of the students graduating from UK universities in 2001/02 in Mathematical Science 25.5% passed with a First Class Degree, compared to 10.4% in Humanities (HESA, 2002) and this trend was also evident in other years for example, 1994/95 (HESA, 1995).CAA, like mathematics and some science subjects, also tends to use the full range of marks therefore the trend towards a high proportion of First Class Degrees may occur in other subject domains adopting this technique in the future.
There is pressure on lecturers not to fail students, and one study found that in professional subjects there is a tendency to leave the award of a fail to the next assessor (Hawe, 2003).Lecturers are confronted with emotional and ethical dilemmas when a close working relationship is formed, increasing their reluctance to award a fail (Sabar, 2002).The emotional and subjectivity issues that are evident in human centred marking may be removed via automatic marking offered by CAA software.
It is important to recognize that some of these issues discussed are still prevalent in CAA along with new challenges.Adopting a diverse assessment strategy may lead to a fairer assessment of the student (Race, 1995).

Computer-assisted assessment defined
From the literature there is a lack of universal consent regarding the terminology and its definition, however, Bull and McKenna (2001) argue that computer-assisted assessment is the common term for the use of computers in the assessment of students and the other terminology tend to focus on the activities.Therefore the definition of CAA used in this review will be that: CAA encompasses the use of computers to deliver, mark or analyse assignments or exams.

Variations in CAA
Within higher education institutions the application of CAA has occurred in a number of varied ways, these include, adaptive testing (Latu & Chapman, 2002;Mills et al., 2002), analysis of the content of discussion boards (Macdonald & Twining, 2002;Wiltfelt et al., 2002), automated essay marking (Christie, 1999;Burstein et al., 2001), delivery of exam papers (Sim et al., 2003) and objective testing (Walker & Thompson, 2001;Pain & Le Heron, 2003).These methods vary considerably however the focus of this review of research will centre on the issues relating to implementing objective tests via CAA.

Testing cognitive skills with CAA
There is concern in the literature relating to CAA and its ability to test higher cognitive skills across subject domains (Daly & Waldron, 2002;Paterson, 2002).The higher cognitive skills are often associated with 'Analysis, Synthesis, and Evaluation' as defined in Bloom's Taxonomy (Bloom, 1956).However, a revised taxonomy takes into consideration the 'Knowledge Dimension' (Anderson & Krathwohl, 2001) and this has also been used in CAA research for classification of questions (King & Duke-Williams, 2002;Mayer, 2002).Paterson (2002) indicated that it is not feasible to test the higher-level cognitive skills using CAA within mathematics.Bloom states that in the majority of instances Synthesis and Evaluation promote divergent thinking and answers cannot be determined in advance (Bloom et al., 1971).Heinrich and Wang (2003) argue that objective testing is still not sophisticated enough to examine complex content and thinking patterns.However, other research in linguistics and computer programming concluded that the higher-level skills can be assessed via CAA through innovative approaches (Cox & Clark, 1998;Reid, 2002).In the study by Reid (2002) a new language was devised and students were required to apply linguistic techniques in order to answer MCQ.It has been suggested that CAA tests of higher-level skills are more complex and costly to produce (Dowsing, 1998) and this may be because more innovative approaches are needed.

Question styles
Objective testing has been used within assessment for over forty years (Wood, 1960) and computer programs delivering MCQ date back to the 1970s (Morgan, 1979).More sophisticated question styles have emerged enabling more diverse assessment methods.The question styles delivered by the TRIADS software developed at Derby University are evidence of this evolution, offering 17 question styles in 1999 (Mackenzie, 1999) and 39 in 2003(CIAD, 2003).However, staff at the University of Liverpool using TRIADS found that this presented an additional problem, as they were unfamiliar with the new question styles and lacked confidence in writing suitable questions (McLaughlin et al., 2004).Staff development in writing suitable questions and guidelines can be used to overcome these problems.For example, generic guidelines developed by Haladyna (1996), Herd and Clark (2002) present examples of the various questions styles used in further education whilst examples used within higher education can be found at http://www.caacentre.ac.uk.
Although there are a large number of possible formats for CAA questions, it is possible to classify them into four distinct groups based on the human interaction technique required (CIAD, 2003).These groups are defined as point and click, move object, text entry and draw object.

Point and click
Point and click questions include Multiple Choice (MCQ) and Multiple Response (MRQ) items, which have both been used within assessment practise for a considerable time and as a result are often transformed into CAA (Ricketts & Wilks, 2002b).Ebel (1972) suggests that any understanding or ability that can be tested by means of Implementation of computer assisted assessment 219 any other technique, for instance essays, can also be tested by MCQ.More complex MCQ questions can be devised through assertion reasoning resulting in the testing of higher cognitive skills (Bull & McKenna, 2001).Both MCQ and MRQ have inherent problems, such as reliance on true and false style questions which students might perceive to be unfair (Wood, 1960).Davies also argues that the quality of MCQ is dependent on the quality of the distracter and not the question (Davies, 2002).

Move object
Move object style questions focus on the movement of objects to predetermined positions on the screen.They are a variation of the MCQ format and are good for assessing students understanding of relationships (Bull & McKenna, 2001).For example in computing they could be used for the labelling of entity relationship diagrams or in linguistics students could be presented with a poem and move the highlighted words to the appropriate word class.One problem is that when the number of moveable objects is equal to the number of targets, if a student knows all but one answer they will automatically get full marks (Wood, 1960).

Text entry
Text entry questions consist of input of short predefined answers, such as factual knowledge or syntax in computer programming.An advantage of this format is that students must supply the correct answer removing the possibility of guessing (Bull & McKenna, 2001) and this style has been found to be the most demanding format for students (Reid, 2002).There are problems associated with text entry within some subject domains such as mathematics, as mathematical expressions cannot easily be included in most commercial software (Croft et al., 2001;Paterson, 2002).Another problem associated with this question style is that the answer may be marked incorrect due to spelling mistakes and the time saving element may be reduced if lecturers need to manually check for spelling errors.

Draw object
This is associated with drawing simple objects or lines.For example, students may be required to plot graphs which can be automatically marked.This style of question is a high discriminator between strong and weak candidates (Mackenzie, 1999).There is little evidence in the literature concerning the effectiveness of this format, but this might be due to the fact that commercial software such as Questionmark and I-Assess do not have this style in their templates.

Interoperability and question banks
Question banks which are authored and peer reviewed by academics are emerging, such as the Electrical and Electronic Engineering Assessment Network who developed a database of questions in electrical and electronic engineering (Bull et al., 2002).One such bank will typically require 5000 questions making it unfeasible for a single institution to develop (Maughan et al., 2001).Constructing high quality questions is difficult, time consuming and expensive (Sclater et al., 2003) and issues arise in the interoperability of questions between CAA Software (Lay & Sclater, 2001).There are several international standards established to enable interoperability of questions between software applications (Herd & Clark, 2003).These specifications are based on metadata structure for questions and their grouping together.Unless these interoperability standards are developed and utilized question banks will have a limited life, as they cannot be used on a variety of delivery platforms (White & Davis, 2000).Systems are emerging that are IMS-QTI compliant (Instructional Management Systems -Question and Test Interoperability Specification) to facilitate the exchange of questions (Daly, 2002;Bacon, 2003).The Centre for Educational Technology Interoperability Standards (www.cetis.ac.uk) offers comprehensive resources and information on the issues concerning interoperability which may help direct further research.

Guessing
A number of the question styles associated with CAA can lead to artificially high marks through guessing (Bush, 1999), which has implications for setting the pass mark of the test.For example, setting a pass mark of 40% based on assessment of true/false answers would be inappropriate, as guessing alone would give an average of 50% (Harper, 2002).The problems of guessing may be addressed through various marking schemes, such as post test correction (Bull & McKenna, 2001), negative marking (Bush, 1999), increasing the number of questions or combining the results from several tests (Burton & Miller, 1999) or increasing the number of distracters and the pass mark (Mackenzie & O'Hare, 2002).It has been suggested that negative marking is not generally implemented in the UK (McAlpine, 2002) and that post test correction is only suitable with a single question style because the formulae would vary depending on the number of distracters (Harper, 2003).
Statistical analysis has resulted in various methods being developed to assist in test construction in order to reduce the effects of guessing.An empirical marking simulator to assist in scoring and test construction based on a base level guess factor has been developed (Mackenzie & O'Hare, 2002), this program examines the mark distribution and measurement scale for a set of random answers, enabling tutors to establish the effects of guessing on their assessment.Also statistics to award a score for partial credit through a formula based on a mean uneducated guessers score has been investigated (McCabe & Barrett, 2003).This allows MCQ to be unconstrained, similar to MRQ styles, enabling students to provide more than one answer and their score is weighted depending on the number of choices.For example, an MCQ with one correct answer, four possible options and a score of 3, if a student includes the correct answer by selecting 2 options they would only score 2 (2=3-1).Davies used a combination of predetermining the students' confidence in answering the question Implementation of computer assisted assessment 221 prior to seeing the distracters and negative marking, resulting in students perceiving this to be a fairer test of their abilities (Davies, 2002).
There is lack of evidence that any one specific technique generates more accurate results than any other.It could be argued that these techniques are unnecessary if the tests are well constructed (Bull & McKenna, 2001).

Accessibility
UK institutions now have to comply with the Special Educational Needs and Disability Act when preparing both teaching and assessment material (SENDA, 2001).The number of students in UK higher education registering a disability in 2000 was 22,290 and this has implications for CAA (Phipps & McCarthy, 2001).For example, a student with dyslexia may exert more cognitive resources in interpreting the question, therefore, ensuring the language is appropriate is a necessity (Wiles & Ball, 2003).In addition extra time may be required to complete the test which may necessitate the publishing of two different assessments, one with a longer duration.Feedback from one dyslexic student regarding CAA indicated that they thought it provided a more level playing field in which they can demonstrate their knowledge (Jefferies et al., 2000).Students with visual or physical impairment may struggle to answer move object and draw object style questions without the aid of assistive technology, they may need specially adapted input software and hardware such as, touch screens, eyegaze systems or speech browsers.
There are guidelines for general teaching, however there is little evidence that guidelines for inclusive and accessible design in CAA are emerging (Wiles, 2002).For example, when multimedia elements, such as video are used within the assessment, it may necessitate the provision of an alternative paper-based version for students with sensory impairment.The introduction of an alternative, in this instance paper, poses the problem of ensuring comparability (Bennett et al., 1999).When identical tests are presented on a computer and paper they are not comparable (Clariana & Wallace, 2002) because there are numerous variables that impact on student's performance when questions are presented on a computer.These variables include the monitor (Schenkman et al., 1999), the way text is displayed on screen (Dyson and Kipping, 1997), reading from a monitor is slower than paper (Mayes et al., 2001) and the problems of obtaining a feel for the exam when only a single question is presented (Liu et al., 2001).The Web Accessibility Initiative (http://www.w3c.org/WAI/) has produced useful guidelines for promoting online accessibility which may be applicable to CAA but this initiative does not address the issue of comparability between questions.

Institutional strategies for the adoption of CAA
The greatest barrier to the adoption of CAA by academics is lack of time, to both develop questions and learn the software (Warburton & Conole, 2003).This may have contributed to the fact that the adoption of CAA has usually resulted from the impetus of enthusiastic individuals rather than strategic decisions (O'Leary & Cook, 2001;Daly & Waldron, 2002).The perceived benefits of CAA of freeing lecturers' time can be illusive if no institutional strategy or support is offered (Stephens, 1994), successful implementation may be left to chance (Stephens et al., 1998) and CAA may be developed in an anarchic fashion (McKenna & Bull, 2000).Research conducted at the University of Portsmouth indicate that there is no time saving benefit for courses with less than twenty students (Callear & King, 1997).In order to utilize the features within software packages staff training and development is necessary (Boyle & O'Hare, 2003) and this may not be feasible without institutional support.
Institutions adopting CAA are faced with the difficulty of evaluating and deciding upon the most appropriate CAA software.Without an institutional strategy, individual departments may adopt their own systems (O' Leary & Cook, 2001).This results in students having to cope with a number of different user interfaces and CAA formats, increased licence costs and problems offering administrative and technical support.Even if an institution has a clear strategy there are also problems in determining the selection criteria for software used to deliver assessment and there is a lack of analysis within the literature (Valenti et al., 2002).Sclater and Howie (2003) contributed to this literature by defining the ultimate online assessment engine.This was achieved through a process of examining the user requirements of the system, establishing the stakeholders and their functional requirements.This research may aid institutions identify their needs and establish an appropriate evaluation methodology.
The following guidelines for an institutional strategy have been formulated by Loughborough University and the University of Luton: establish a coordinated CAA management policy for CAA unit(s) and each discipline on campus; establish a CAA unit; establish CAA discipline groups/committees; provide funding; organize staff development programmes; establish evaluation procedures; identify technical issues; establish operational and administrative procedures (Stephens et al., 1998).BS7988 is a new British Standard Code of practice that has been introduced governing the use of information technology in the delivery of assessments (BS7988, 2002).The guidelines have various implications for the delivery of assessments, for example, it is recommended that students take a break after 1.5 hours which has an impact on the invigilation process.If this recommendation is followed, procedures need to be established to prevent collusion between students during the break or the tests need to be split into two separate sections.One of the difficulties for many institutions using CAA arises through the lack of resources to accommodate large cohorts of students sitting the exam simultaneously (Mackenzie et al., 2004).This problem can be alleviated through institutional support and therefore, to fully utilize the benefits of CAA an institutional strategy would appear necessary to increase the chance of successful implementation.These benefits are evident within a number of institutions with strategies, such as, Ulster (Stevenson et al., 2002), Derby (Mackenzie et al., 2002), Coventry (Lloyd et al., 1996) and Loughborough (Croft et al., 2001).

Security
The move from traditional teaching environments and examination settings presents additional issues relating to security.Frohlich (2000) states that in traditional environments it is possible to ensure the security of the exam papers and scripts, this includes the transportation to and from the exam venue.However, even under this system breaches in security do occur, for example AQA had to replace 500,000 English and English Literature exam papers after a box had been tampered with (Curtis, 2003).Tannenbaum (1999) defines security in computer systems as consisting of procedures to ensure that individuals cannot access material for which they do not have authorisation.This is essential within a CAA environment as questions and student details are stored in a database and usually the test data is sent over a local network or the Internet.Before computers were connected to the Internet it was relatively easy to have effective security measures (Mason, 2003), but transmission of sensitive data over an insecure network requires additional security measure to be implemented.
Encryption techniques can be used to ensure the security of the questions and answers when transmitting data over the Internet (Sim et al., 2003).To increase security, examinations can be loaded on to the server at the last minute (Whittington, 1999).If email is used to submit results there is a potential risk due to the lack of authentication (Hatton et al., 2002).Four security requirements have been identified by Luck and Joy, these being: all submissions must be logged, it must be verified that a stored document used for the assessment is the same as the one used by the student, a feedback mechanism must inform students that their submission has been received and the identity of the student must be established (Luck & Joy, 1999).
With the majority of CAA software students and administrators are required to have passwords which is often the weakest link in terms of protection (Hindle, 2003).Although an unlikely event, students could get access to the administrator password and change their results or gain access to the questions.Other concerns include authentication and invigilation of the students, which can be are particularly problematic in remote locations (Thomas et al., 2002).At present students enrolled on distance learning courses overseas need to sit exams in a specific location such as, the British Council Offices to enable authentication and invigilation.Research is being conducted to overcome these problems but unless solutions are found, geographical barriers will remain as students need access to the test centres.
During the test computers need to be locked down, removing the possibility of accessing other content and secure browsers have been developed to enable this such as, Questionmark Secure (Kleeman & Osborne, 2002).There are operational risks associated with CAA that have security implications such as the server crashing and these risks need to be identified and procedures established to minimize them (Zakrzewski & Steven, 2003).
There are software standards for security for example, the British Standards on Information Security Management BS7799, which has also been adopted as an International Standard IS17799.In addition, when data from the test has been collected institutions within the UK should abide by the Data Protection Act 1998 (Mason, 2003).If security measures are in place there is no evidence to suggest that the integrity of the examination is more compromised by delivery over the Internet than by paper.

Conclusion
The implementation of CAA from a technical and pedagogical perspective is a complex process.The first, and perhaps the most important, lesson that can be learned is that an institutional strategy would seem to greatly increase the chances of success.There are recommendations that have been made to assist policy makers formulate an effective strategy.Without institutional support implementing security procedures may be more problematic, such as locking down PCs.However, authentication and invigilation in remote locations is still an issue that has yet to be fully resolved.
The other important lesson that can be learned is in relation to staff development and training in test construction within a CAA environment.Focused staff development may help alleviate a number of issues, such as guessing, testing various cognitive skills, using appropriate question styles and accessibility.The emergence of question banks may also address these issues depending on their level of interoperability.Another issue is that whilst there are guidelines relating to accessible online content there are still no formal guidelines relating to CAA.
The reliance on a single method of assessment is problematic and a diverse assessment strategy is usually necessary.Within an environment of increasing student numbers and a reduction of staff to student ratio, CAA would appear to be a partial solution.This study has highlighted the issues surrounding the implementation of CAA to both inform and direct further research in the field.