AI usage among students has reportedly jumped from 66% to 92% in just a year. What’s driving this surge and what are universities underestimating about how students are actually using these tools during exams?
Students know that a lot depend on their exam results, they know or at least assume that other students will be using AI in their assessments if they can, so there’s a very natural incentive and pressure to use it. It’s more nuanced than just ‘laziness’ – there’s a valid calculation that, if other people are using AI and getting away with it, then I’m putting myself at a disadvantage if I don’t.
I don’t think universities are necessarily underestimating the use of, or the incentives to use AI – as far as I can tell, everyone in the sector is using AI in some form or another and it would be incredibly naive to think that students would not use it in exams. Universities are adapting, but this is a significant change to how exams are conducted and assessed and the technology itself is evolving at a rapid pace.
From your work with organisations delivering high-stakes exams, what are the biggest vulnerabilities in current online exam systems that you’re seeing in practice, not just in theory?
“Contract cheating”, where a student pays someone else to write their essay or even sit their exam for them, I think is more widespread than people think. The spotlight may be on AI at the moment and likely will be for the foreseeable future, but contract cheating is a completely different category which is often overlooked. It is actually easier to mitigate via more stringent identity verification and I think we’ll see that become more commonplace in the future.
How realistic is it for universities to distinguish between genuine student work and AI-assisted responses in an exam setting today?
I think it’s difficult. There are tools out there which can detect, in a probabilistic manner, whether text has been generated by AI. But there are several challenges.
Firstly – AI is evolving rapidly. Many of the signals are just “current patterns” – for example, many people have noticed that AI generated text tends to result in long-hyphens instead of short hyphens – but this is just a minor formatting quirk of a particular model version and not a long-term ‘fundamental’ of how LLMs generate text.
Secondly – if a student does not simply copy and paste from AI, but instead uses AI to formulate their argument, and then types it in their own words – that’s still using AI and would be considered cheating in a closed-book exam, but the actual output is then written by the student themselves. For open-book assessments where students may be expected to use the internet to research and write an essay – where is the line drawn between using Google and using GPT?
Thirdly – how sure are you? If a tool says that an essay has a high probability of being written by AI, is that evidence strong enough to stand up to appeal if the student contests it? Almost certainly not. I don’t think these detection tools have been validated enough to really know, across different segments of the population, what the false positive and false negative rates are. False positives are particularly important because, we currently have no idea whether, for example certain personality types happen to write in ways that are more likely to follow the writing style of LLMs, but it seems entirely plausible.
Ultimately I think trying to detect whether a given output has been generated by AI, at best is a potential deterrent to cheating, and can add some evidence to the overall determination – but should not be relied on as a primary method of detection and the conclusions from it should be interpreted with extreme caution, and in conjunction with other sources of data such as proctoring and samples of the students’ previous writing.
Are we reaching a point where unsupervised online exams are no longer credible?
Exam security is a spectrum, it’s not always practical or even possible to run supervised online exams, so I think there will always be a role for unsupervised online exams, but that role is certainly shrinking, and certainly for high-stakes exams which have real academic or professional implications, I think institutions will find it increasingly difficult to justify decisions not to enforce supervised conditions.
Are traditional assessment formats like essays and multiple-choice exams still fit for purpose in an AI-enabled world, or do they need to be fundamentally rethought?
I don’t think we should forgo traditional assessment formats because of AI. Assessment formats such as essays or MCQs, are extremely well validated and ultimately, there are only so many different ways you can assess knowledge in a standardised way, and AI poses a challenge to any you could think of. AI is a general purpose technology that can write an essay, answer questions, generate a video and much more, I don’t think we can develop a question type which is immune.
What does this shift in AI use mean for how universities design and deliver assessments going forward?
Where we should adapt is in how the assessments are structured and delivered, and we need to think very carefully about what the purpose of an assessment is. I believe there is a role for closed assessments (where the candidate is tasked with demonstrating their knowledge without access to external resources), as well as open book assessments where the candidate is expected to use external resources to complete their task.
For closed book assessments – this really comes down to how the assessment is secured, locked down and supervised. We are already in an age where many assessments are delivered online and tools exist to lock down and supervise them – I think AI puts a greater pressure on institutions to use these tools to their full potential, and to design their assessments with more of a ‘zero trust’ paradigm.
Open book assessments have been gaining in popularity for many years and a popular argument in favour of them is that in a world where people do have access to the internet, assessments should be designed to reflect that and to assess their ability to use the tools available to them to research and synthesise their argument. You could argue that AI tools are simply a natural extension of this paradigm – “how well can you use your LLM” – but I’d be hesitant to go that far.
I think there is a real difference between “how well can you use Google to find and interrogate information and synthesise it into an essay” and “how well can you write a prompt”, or even “how good is your Claude subscription”. I think the solution to this is an approach which allows access to certain tools only, and where student activity is monitored and reviewed. Essentially, an partial-open-book assessment which still takes place within supervised, restricted conditions.
More from Interviews
- A Chat With Joe Crist, CEO Of Transform 42 Inc. On Why Persistent AI Will Split Business Into Two Worlds
- Systems Thinking in Design: Uliana Salo, Platform Design Leader
- Meet Ahmed Hessam, CEO And Founder Of OSAA Innovation
- A Chat With François Bitouzet, Managing Director On The Importance Of Global Tech Events Like VivaTech
- Interview With Arthur Azizov, Founder Of B2BROKER Group And B2BINPAY On AI Models On Trading Platforms
- From Basement Build To 1.5 Million Users: A Chat With Elston Baretto, Founder And CEO Of Tiiny Host
- Interview With Juliette Savage, Commercial Director At Little Starts Gift Cards On Creating Gifting Experiences For Kids
- A Chat With Adonis Celestine, Senior Director And Automation Practice Lead At Applause And AI45 Judge
As universities rapidly scale online exams, what are the most common mistakes that end up undermining integrity?
Specifically when dealing with online exams at scale, I think the most common mistake is not using a platform that has been specifically designed to deliver online exams.
There are many low-stakes quiz tools which an individual lecturer can subscribe to, and almost all Learning Management Systems will have some form of quiz or assessment module. At smaller scales, for ad-hoc use, or for low-stakes assessments – they will get the job done, but cracks start to appear when you move beyond that.
A large-scale online exam is really unique compared to how most online services work – spikes in traffic can be extremely sudden, technically-savvy candidates will try to find ways to circumvent your security, and candidates in locked-down exams have no ability to refresh their browser or restart their computer if anything goes wrong. These are all issues which will come to light when delivering any kind of high-stakes assessment.
From what you’re seeing on the ground, what practical steps or technologies are actually working right now to safeguard exam credibility?
Institutions are adapting as quickly as they can, but this is the early stages of a kind of arms race between offensive and defensive technologies and how they can be used to cheat, or to prevent cheating.
Right now, we are still in the early stages of it and as such the immediate response is around how to quickly adapt current assessment modalities to the new environment. The exact technologies used varies widely depending on the region and the type of assessment, but typically involve a combination of lockdown enforcement (placing the candidate’s device into a restricted mode), proctoring (remote monitoring of a candidate’s screen and webcam) and AI writing detection.
Longer term, I think we’ll start to see deeper changes in the way assessments are formulated and devised.
How can universities strengthen exam security without creating a worse experience or disadvantaging students?
Exam security is always a trade-off – unfortunately, increased security on anything inevitably introduces friction into a process – but it is of the utmost importance that this does not unfairly disadvantage any student.
I think great care and attention should be taken when developing and configuring tools which increase exam security. These tools are inherently invasive and I think candidates and their data should be treated with a great deal of respect and care. I think vendors of these tools should clearly and transparently explain how their data is being used, why it is necessary and they should do so in plain language, not legal jargon. Institutions should also explain to their students why these measures are being used and how it benefits them, which I think is often overlooked but at the end of the day, these tools protect the integrity of the exam, and therefore the credential that they receive from it.
Over the next 3–5 years, how do you expect assessment to evolve as AI becomes fully embedded in how students learn and work?
I think we will start to see more ‘hybrid’ assessments where students are allowed to use external resources, but under restricted or supervised conditions. Traditionally we have categorised assessments as ‘closed’ or ‘open’ book but I think with AI it creates a significant need for a type of assessment where student have access to some, but not all, online materials. In order for this to happen, on the institutional side we’ll need a re-think of how these assessments are designed and, what skills are being assessed. On the technologist side, we’ll need to see assessment platforms that can create an effective sandbox where students are monitored, have access to a range of materials but others are blocked. As a part of this we may see more emphasis being placed on not just the output of an exam, but the process the student took to derive it.