Assessment Blog | RM

AI Marking - The importance of keeping humans in the loop

Written by Lisa Holloway | Mar 7, 2025 3:17:27 PM

The rapid advancement of Artificial Intelligence (AI) technologies is driving new possibilities and aspirations across the education and assessment sector, but how could AI and human expertise work together to ensure fair and accurate assessments, whilst providing rich and insightful feedback to support the learning journey?

At RM’s Spring Conference, Bridging AI in Assessment in February 2025, our customer panel, experts from within the assessment space debated with a panel of AI experts about the benefits of AI within marking.

In this blog post, we explore how AI can be ethically integrated into the marking process without diminishing the vital role of human expertise. We discuss the role of AI in marking and the challenges and benefits with our panel of experts, backed up by the results of RM’s AI Marking Proof of Concept (PoC) projects.

The role of the human in AI marking

During the conference, Dr Gráinne Watson Chief Operating Officer at RM presented the results of RM’s first two Proof of Concept (PoC) projects, one testing summative assessment of English language skills in schools, and the other for PoC testing proficiency in English as a foreign language, showing how our customers and prospects will be able to harness the benefits of AI responsibly, tailoring it to their organisations’ assessment and qualification processes in ways that will improve learner outcomes and drive continuous improvement without introducing risk and without diminishing the role of the human.

RM has demonstrated that:

  • AI marking is as effective as human marking – it’s already proven that AI marking is close to or on a par with humans, so we demonstrated that AI improves feedback quality, even when it is marking an essay (or long form answer). AI doesn’t just mark at the equivalent level to human markers, but it can improve outcomes, as it marks with speed, reliability and accuracy, therefore offering the potential to expedite candidate results.

  • AI marking is more consistent than human marking – AI can identify an appropriate grade of a response and mark responses of that grade range as consistently or better than a human marker, without bias and without the risk of potential fatigue.

The PoC for testing proficiency in English as a foreign language showed AI marks were just 0.22 of a mark away from the average of the human markers when marking at scale. We then compared that result to the overall deviation between the marks of each of the human markers (0.55); this result demonstrated further significance around the variation in human marking today.

During the PoC testing summative assessment of English language skills in schools, the AI marks were just 1.76 of a mark away from the average of the human markers when marking at scale.

When we compared that to the overall deviation between the marks of each of the human markers (1.82) that result is again a key indicator that AI marking is demonstrating more consistency than human marking.

  • AI marking enhances teacher support - we have shown how AI marking can:

    • Reduce the workload burden felt by teachers – automating repetitive tasks, enabling more focus on teaching and learner engagement to drive better outcomes

    • Offer the ability to upscale practice exam content/quizzes - giving students extra support, whether in the lead up to summative assessments evaluating the final learning at the end of a unit or course, or as part of the formative assessments used throughout the learning process to monitor progress and provide feedback for improvement

    • Provide learners with further valuable feedback to improve their marks – using AI at low cost, without requiring additional time and cost investment from teachers and awarding bodies

RM demonstrated that AI could mark effectively and at speed, enabling provision of results to candidates more quickly. 

  • There is no barrier to smaller organisations with smaller numbers of assessments using AI marking - historically, AI had to be trained using a large volume of data to build an accurate marking tool, but RM has overcome this barrier by using synthetic data generation and the latest Large Language Models (LLMs) to enable the testing of the marking engine.

AI Marking: An ethical approach

During her presentation, Gráinne described how the AI marking engine RM created had begun by marking misspellings harshly, a problem that hadn’t been expected, which wouldn’t allow fair marks to be awarded to learners with dyslexia or dyspraxia, for example. The team at RM worked on the benchmark and completely changed the model to make sure the AI recognised that as long as the answer was understandable, it didn't matter.

Core for us - this had to be additive. It wasn't a replacement, particularly when you're talking about summative assessment, which is your end final goal. The percentage of a mark matters. You have to have the ability to show why you've given that mark…We have to be able to help you show that, and that's what the point of the marking engine was,” explained Gráinne.

RM’s PoCs illustrated the potential of AI and the opportunities it presents when used in a responsible manner. If the human resource is focused on more complex areas of the marking process, learners can have confidence that their exam grade is the right one and that it has come out of a process that is fair and transparent with an appropriate level of human engagement.

During their discussions, the expert panel at the conference emphasised the importance of supporting teachers in adopting AI while maintaining their essential role in education.

Lynsey Meakin, Senior Lecturer at the University of Derby compared the use of AI to the previous introduction of the calculator, saying it was feared at the time that students would no longer have to learn maths but that's not the case because the calculator is a tool, a time saving resource in the same way that is true today of generative AI.

Looking specifically at the importance of keeping humans in the loop, the panel’s consensus was for measured, thoughtful implementation of AI that complements and enhances, rather than replaces human expertise. They shared their views on how they thought the benefits of AI marking could be gradually introduced to augment marking.

Richard Eckersley, Head of Assessment at Institute for Chartered Accountants in England and Wales (ICAEW) talked about the benefits of AI marking to help humans with marking and undertake the marking itself.

Paul Houghton, Director of IT Transformation, NEBOSH, explained that they have an AI special interest group exploring how it can help them write assessments people then review.

Sara Pierson Director of Assessment and also Sales and Marketing globally for English language teaching business at Oxford University Press, said, It can really support efficiency around marking, item generation, test banks, etc and the management of that.” Sara asked everyone to think about using AI to complement and supplement the learner's journey and commented particularly on the use of AI for formative assessment, within the classroom rather than for high stakes, summative assessment.

Dr Matthew Glanville, Director of Assessment, International Baccalaureate agreed: “Where I think we can be doing it right now is in using it as a tool to help teachers spend more time teaching. In those low stake environments.”

 

To summarise, we are quoting the words of Dr. Gráinne Watson, Chief Operating Officer of RM:

You are never going to lose the human in the loop. Never. I wouldn’t want something purely marked by AI... It will provide a level base and that will help whoever that human is making that decision but the core here is not replacement. It is a way to make both of those things work intricately together.”

Exam boards and awarding bodies can use AI to drive continuous growth, improve learner outcomes, increase productivity, drive cost efficiencies, support the human marker, and attract new learners.

By combining AI with human expertise, we can achieve fairer and more accurate assessments to achieve better learner outcomes.

 

AI in marking - What could AI mean for you and your organisation?

Fill in the form to register your interest in being part of our AI in Marking proof of concept work group.

Join us and our community of top awarding bodies, regulators and technology partners to find out about AI, qualifications and assessments.

Whether you want to take part in testing or just have us consider how it would work for your unique challenges, we will be in touch.

Watch the recordings of the conference sessions or download the transcripts.

Download the paper, AI in Education Assessment: Responsible improvements for learner outcomes – without diminishing the role of the human to find out more.