Peer-graded individualised student homework in a single-instructor under- graduate engineering course

Research in Learning Technology 2020. © 2020 O. Cleynen et al. Research in Learning Technology is the journal of the Association for Learning Technology (ALT), a UK-based professional and scholarly society and membership organisation. ALT is registered charity number 1063519. http://www.alt.ac.uk/. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license.


Introduction
In undergraduate engineering studies, very large improvements in the quality of learning can be obtained by reworking the format of traditional "top down" fundamental courses consisting of lecture, exercises and written examination. Processes for this include the adoption of techniques developed for MOOC courses in the classroom (Brita-Paja et al. 2019), the complete flipping of courses (Schrlau et al. 2016), and the flipping or restructuring of entire educational programmes to focus foremost on student development (Cheville and Bunting 2011).
Those deep-reaching changes necessitate resources that are often unavailable outside of top-ranking institutions. Teachers may nevertheless wish to innovate in the context of high student-to-teacher ratios, low institutional support or minimal technical infrastructure. In such situations, it may still be possible to develop and implement some of the lowest cost-to-benefit techniques, progressing step-wise within a continuous quality improvement process (Cheville and Bunting 2011;Brita-Paja et al. 2019). Some of those methods foster student engagement through increased feedback and networking with peers (Hepplestone et al. 2011;Badge et al. 2012). This article reports on the set-up of a homework programme in which students grade their peers anonymously. Several such programmes are reported upon in the literature (e.g. Gehringer 2000; Bhalerao and Ward 2001;Pope 2001); here, the focus is on deployment with minimal resources.
A new graded homework programme was implemented in a mandatory course of the Chemical and Energy Engineering Master's programme at the University Otto von Guericke of Magdeburg, a German public university. The programme consists of a series of individualised homework assignments that are anonymously graded by peers, and was developed in the wake of a successful pilot in an unrelated course in the same university (Magdowski 2018;Magdowski 2019a). The 40-h, one-semester course of concern in this report, Fluid Dynamics (Cleynen 2019), has a single instructor and a class of 121 students. Prior to implementation, the evaluation consisted of a single 2-h closed-book anonymised examination, which is the official standard practice in the master's programme.
In this article, the new peer-graded homework programme is described and its efficacy is assessed, with a focus on describing feasibility, and presenting data relevant to instructors working with similar conditions: high student-to-teacher ratios and minimal resources.

Objectives
The homework programme seeks to compensate for limitations of a single-examination evaluation. Although the grading process in an examination is systematic and rigorous, it only tests for one subset of the skills required of an engineer: arguably, most real-world problems in engineering disciplines are unlikely to be solved in isolation, without documentation and under stringent time constraints. More importantly, the examination represents a single point of failure for students. Going through the examination is, therefore, an intense, understandably stressful and unenjoyable experience.
The potential of peer evaluation for improving learning is well described in the literature (e.g. Gibbs andSimpson 2005, andespecially Gielen 2007). Here, as part of the new optional peer-graded homework programme, students would be proposed three problems along the semester. Each time, each participant would: 1. receive an individualised problem, the structure of which is identical for all students, but in which the questions have unique numerical data; 2. submit their anonymised answer within a 1-week deadline; 3. receive two anonymous submissions from peers, each together with the corresponding complete worked-out solution and a grading form; 4. assess and grade the two peer submissions within a 4-day deadline; 5. receive their own grade for the problem, as attributed by two anonymous peers.
Thus, over the semester, student input would be requested twice for each of three problems. Students would be encouraged to work in groups and discuss their answers, but would be required to write and submit their answer themselves.

(page number not for citation purpose)
While implementing the programme, the authors had the following three objectives: 1. Increase the share of group work and problem solving in the course assessment, for better relevance to professional skills in the engineering profession.
If successful, the programme should provide students with an opportunity to extend beyond the conceptual knowledge level of Bloom's revised taxonomy (Anderson et al. 2001), into the procedural level, extending from the cognitive process "understand" into the process "apply". 2. Provide an opportunity for students to receive grades for meaningful work performed in a relaxed and fun atmosphere, conditions well recognised to be conducive to learning (Barkley 2009). If successful, the (optional) programme should elicit and maintain high participation rates. 3. Deliver the above at a low time cost for the instructor. If successful, the programme should be rapidly deployable, require no additional grading to be performed by the instructor and involve minimal operations management time.

Pedagogical aspects
The homework programme was designed to support the "ordinary" course work and examination preparation of students along the semester, rather than an expansion of the curriculum. Regular collaborative work towards answering examination questions has been recognised in the literature as contributing to deeper learning and understanding (Duret et al. 2018); formalising and rewarding this type of activity was the focus here. The problems were designed to be immediately relevant to the course problem sheets and consisted exclusively of examinable material. The homework programme was completed 3 weeks ahead of the examination period, to prevent interference.
In each homework problem, two technical questions were asked (e.g. "What is the force applying on the aquarium window?''; Figure 1), with no guidance towards the solution.
Students were asked to grade their peers as they would review a colleague's work in a professional setting. For each question, the marking was split across three categories: 1. (40%) -Is the correct general formula for the answer provided? 2. (30%) -Is the correct corresponding calculation provided? 3. (30%) -Is a sensible final result provided, expressed with the correct units?
A comment field was provided. Finally, up to a quarter of the points in the final grade could be removed for poor formatting or poor organisation, at the discretion of the grading peer.
It is important to note that in the programme, the grading was fully carried out by the students themselves (with students given the possibility to appeal their mark directly to the instructor). Within the framework for blended learning assessment proposed by Mirriahi et al. 2015, the peer assessment process partly reaches level B ("Online technologies are used to provide feedback on student progress'', "standards-based rubrics are used by students to self-assess and peer-assess''). A manual, instructor-approved review of the grading would involve considerably more human resources, as evidenced in Zare et al. 2017, and would reduce student accountability and empowerment, which are recognised as important ingredients of engagement (Barkley 2009).
For this first deployment, participation was optional. If a student completed all three assignments and performed all three gradings, their average grade would count 20% towards their course mark, with the examination amounting to the remaining 80%.

Technical implementation
From the instructor side, the programme was run using a series of scripts on a local computer. Those scripts, in Python, LaTeX and Bash, are published together with a working example under an open-source license in an online Git repository (Cleynen and Santa-Maria 2019). Communication and file transfer occurred entirely by email.
The choice of implementation technologies is not indifferent. Usage of e-learning platforms such as the university's complex and non-uniformly used Moodle instance would have brought up known issues with the integration of online grades ( Mayhew 2018) and added another learning step for most participants. The set-up of a purpose-built web server, as opted for in reports of earlier peer-graded programmes (Gehringer 2000;Bhalerao and Ward 2001), was out of scope. Finally, requiring participants to use the services of an external company would have raised well-known privacy issues associated with educational data mining (Kyritsi et al. 2019).
By contrast with the above, only simple, portable technologies are chosen here, allowing for a crude but rapid implementation by a single instructor with no external help and minimal budget. Only one email account and one personal computer are Figure 1. Example of a student-unique homework problem. Note: In the provided text, the dimensions and input data for the problem are unique to each student. The resulting problem answers may vary by several orders of magnitude. In this project, assignment illustrations were not adapted to the the custom values in the problem; however, this was implemented in a separate implementation of the programme (Magdowski 2019b).

(page number not for citation purpose)
required, and a very satisfactory level of information security and privacy protection is attained.
Running through one homework problem involves the following steps for the instructor: 1. Prepare a generic homework assignment and its complete corresponding solution, written in LaTeX. In this assignment, the numerical values of the input data (e.g. the velocity of a water jet) are all replaced with corresponding text keywords (e.g. velocity_water), to be later replaced automatically with student-assigned values on a case-per-case basis. 2. Solve the generic assignment with a Python script. This involves defining, given a set of arbitrary input data, every number to be displayed in the corresponding solution sheet. 3. Individualise the homework assignments (a Python script). This script assigns for each participant, based on a hash of their email address, a series of arbitrary (apparently random, but reproducibly generated) numbers to be used as input for their homework assignment. The text keywords (e.g. velocity_water) in the generic assignment are then replaced by student-unique values (Figure 1). The script then creates two LaTeX files for each participant: one for the assignment and one for the solution. 4. Create the homework assignment PDF files (a Bash script). This script simply compiles each assignment LaTeX file into a PDF file. 5. Send the homework assignments (a Python script). The script sends each participant a personalised email, with their personalised assignment attached. The participant is requested to submit their answer as an anonymous PDF document with a specified, unique file name. 6. Receive the homework submissions. This is simply done using a purpose-built email account and a standard email client (here, Thunderbird with the FiltaQuilla extension). The attachments of the emails sent by participants are simply exported at once to a folder. 7. Assign each submission to two peers (a Python script). Among the participants, each submission is assigned to two randomly chosen peers. The script ensures that no submission is assigned to its author, or twice to the same peer. 8. Create the grading assignment PDF files (a Bash script). This script concatenates the PDF files of each participant homework submission together with the corresponding solution and a grading form. 9. Send the grading assignments (a Python script). The script sends each participant a personalised email, with two grading assignments attached. The participant is requested to fill in the PDF forms, and submit their answer by returning the files without changing their file name. 10. Receive and store the graded PDF documents (again using an email client). 11. Post-process the graded assignments (a Python script). The script parses the graded assignments, extracts the grades and comments therein, and outputs the result into a spreadsheet. 12. Send each participant their grade (a Python script). The script sends each participating student a report of the grades they were assigned by their peers.

Time budget
From the instructor side, the time spent preparing and sending one homework assignment (starting from a completely-defined and solved hand-written exercise, i.e. going through steps 1-5 in the section above) was about 4 h. Two more hours were needed to process student submissions and send peer grading assignments (steps 6-9), and 4 h were finally needed to post-process and communicate the peer-assigned grades (steps 10-12). Thus, once the instructor is acquainted with the programme, a time budget of about one large workday is needed to run through each homework exercise.

Student participation
The churn rate for the programme (the event-to-event percentage of participants dropping out) averaged at slightly over 4% for each of the six inputs required of the participants, adding up to 23% overall ( Figure 2). A total of 103 students registered for the coursework, and 113 registered for the final examination; the union of those two groups represents 121 students. Participation in the coursework programme is thus relatively high, and markedly higher than the lecture hall attendance, which is measured at 51 at the end of semester.
Attempts were made to correlate the occurrence of student drop-out from the coursework programme with grading data available to the instructor during the programme. The peer-assigned grades were compared for students who completed the programme, and for those who ended up dropping out (Figure 3). The differences between the number of points assigned by the two peers on each submission were also monitored for both groups (Figure 4). From those distributions, it is seen that the grades of dropout students did not stand out from those of other students, and that their submissions did not result in noticeably higher disagreement between peers. Thus, no indicators were found in the peer grading data that would allow the instructor to infer that a student would be at risk of dropping out of the programme during the semester.

Technical obstacles
Most of the encountered technical difficulties were related to the students' low fluency with file management and handling of PDF files. On the first assignment, 27% of participants could not send their submission with the requested file name, and 16% of them submitted grading forms which were either empty or not machine-readable. Upon investigation, it was found that most problem files were either created using PDFium Figure 4. Distribution of cumulative peer disagreement for two groups of students: on top, students who completed the peer-graded coursework programme; at the bottom, students who dropped out of the programme. Note: For each answer in a submission, the modulus of the difference between two peer-assigned grades is calculated, and the sum for all answers constitutes one point of data in this graph. Papers towards the right therefore feature higher peer controversy. The horizontal scale is in points, out of 100 pts. Figure 3. Distribution of grades assigned by peers within two groups of students: on top, students who completed the peer-graded coursework program; at the bottom, students who dropped out of the programme. Note: All grades are out of 100 pts. (the PDF reader built-in within the Chrome browser, which is unable to record user-input data in forms) or piped through a converter (the 'Microsoft Print as PDF' printer utility built into Windows). More than a quarter of participants sent duplicate or even triplicate submissions. Those problem occurrence rates reduced along the semester. Student computing equipment and computing proficiency were found to be extremely heterogeneous, with some students not possessing a laptop of their own.

Student experience
Feedback from the students was gathered orally by the instructor during tutorial sessions, in an unstructured manner. No systematic feedback channel is used (results from the the Faculty's student evaluation programme, ran at the end of the semester on a small sample of students and not focused on student learning, are unchanged compared to previous years).
Based on oral feedback and interaction during tutorial sessions, the student's experiences of the coursework programme were overwhelmingly positive. Students mentioned the ability to work in groups, and the scope of the assignments (non-obvious yet achievable), as strong points.
Student dissatisfaction occurred when the grades given by peers differed significantly (either between the two grades received by one participant, or between the grades received by two students with similar submissions). The majority of those disagreements were found to occur when answers had been ambiguously formulated: for example, when peers were not able to determine whether a mistake resulted from a calculation or from an understanding error.
Students were given the possibility to appeal their mark to the instructor, if they provided a brief and precise argumentation. Six per cent of participants appealed their grade; one in five of those appeals was discarded. Many appeal conversations were tense. Generally, it was found that the time spent by the instructor to process appeals was disproportionate compared to the resulting pedagogical benefits. It is concluded that more time must be invested at the start of the programme to communicate its pedagogical objectives (learning through problem-solving practice, collaboration and peer review) as well as limitations (occasionally imperfect grading; minimal instructor time budget).

Student learning
By design, the coursework programme requires individual effort, i.e. even an unmotivated participant plagiarising their neighbor needs to work with differing input values for their calculations, which oftentimes translate into orders of magnitude of difference in their results. Cooperation between students is encouraged, which results in a relaxed work atmosphere.
In unstructured oral feedback, multiple students reported that the grading scheme, which allocated 40% of points to providing the general formula for the answer, helped them improve their working methods. Separating the solving of the problem from the numerical calculation of the answer is an important engineering skill, and an incentive was provided for students to practise this ahead of the end-of-semester examination. The grading of peers, when submissions contain errors, confronts students with the practice of debugging and identification of errors in external documents, which also highlights the benefits of adequate structuring of answers.

(page number not for citation purpose)
Finally, being graded by peers also confronts students with differing expectations regarding presentation. On average, 23% of participants had points removed from their submission for cleanliness and presentation (losing 7.5 pts on average), at the full discretion of their peers. In this respect, the grading by peers is more effective than when performed by an instructor, and gets the students closer to a professional workplace environment.

Grades and student performance
Coursework grades were overwhelmingly positive (av. 85 pts, med. 85 pts, f. 0%; see Figure 3) and helped 82% of participating students with an average of 5 pts towards their final mark ( Figure 5). Nevertheless, the distribution of final marks for the course was almost unchanged compared to the previous years (av. 63 pts, med. 69 pts, f. 26%). A lower performance was observed for students who dropped out of the coursework programme (med. 51 pts) or who did not take part in it (med. 33 pts), although their marks were very broadly distributed. No correlation is found between marks obtained in the coursework programme and in the examination ( Figure 5).
The absence of change in the course grades year-to-year and the absence of correlation between coursework and examination grades are both unexplored, and many factors may play a role. Examination grades feature year-to-year variations which, in this case, may offset some of the effect of the coursework on student performance. It is possible that a share of students adapt their efforts in examanination preparation to the number of points they secured in the coursework. Student demographics vary from year to year and between the groups of interest in this work. The skills involved, as well as the work atmosphere, differ strongly between the evaluations, which may then each benefit different students. In the context where the programme was implemented, it is common for students to attend the course but not the final examination, or the other way around, sometimes several times; this information is not available to teachers. The project presented here was not aimed at improving examination grades, and the authors never intended for their students to be the subject of their research. No information is known to the authors about the students beyond their name and academic email because the University does not make this information available; this prevents observation (and mitigation against related biases) of factors such as prior academic performance, seniority or other demographics of interest.

Future improvements
A system is needed to gather student feedback in a systematic, structured manner. In future implementation, students' input about their experience while going through the programme, about whether the grades they receive match expectations, as well as their views about the programme, will be solicited and gathered using web forms along the semester.
The coursework's technical infrastructure may be extended to allow for more flexible submission timing, groupwork, or to propose different levels of difficulty for students to choose from. Nevertheless, if the implementation of the programme is to remain simple (a key objective), great care must be applied not to increase its complexity.
Finally, a weakness of the current implementation is the reliance on the PDF format for the grading system. Different files of this format can be easily concatenated together, and so this simplifies the participants' management of submissions, solutions and grading forms. Nevertheless, documents are hard to read with screen readers, and hard to interact with without a mouse. Better accessibility for visually impaired participants, motor-impaired participants, or participants using mobile devices, would be achieved if more flexible formats (e.g. HTML webpages) were used instead. Here, a better balance between simplicity and accessibility may be striked.

Conclusions
This implementation of peer-graded coursework system in a large undergraduate engineering class is deemed successful. The solving of individualised problems and the grading of peers by participants provides students with new perspective and involves skills not usually part of closed-book examinations. The optional programme elicits high participation rates, supporting observations that participants enjoy the work and study experience and receive grades for meaningful work. The described technical solution, based on an open-source set of scripts ran on a local computer, allows a single instructor to deploy the program rapidly. Operation of the programme involves only moderate time input from the instructor. Those characteristics make the programme a suitable option for instructors seeking to enhance the learning experience of students in the context of low-budget, high student-to-instructor ratio settings.