As summer draws to an end, students are begrudgingly heading back to school—and they’re bringing ChatGPT with them. Educators are grappling with how to deal with this new technological landscape. While tools like GPTZero have been developed in order to help them identify bot-written essays and assignments, the reality is that large language models (LLM) are growing ever more sophisticated and harder to catch, which means they are much better at completing assignments for students without anyone getting wise to it.
Case in point: A new paper released today in the journal Scientific Reports found that ChatGPT performed similarly or better than college students at certain writing assignments. The authors also found that AI-text detectors like GPTZero and OpenAI’s AI classifier did an inadequate job at catching the bot-completed assignments.
Moreover, nearly 75 percent of students surveyed would use ChatGPT to help them complete their homework—underscoring the potential challenges educators have going into the new school year.
“We find that ChatGPT’s performance is comparable, if not superior, to that of students in a multitude of courses,” the study’s authors wrote. “Moreover, current AI-text classifiers cannot reliably detect ChatGPT’s use in school work, due to both their propensity to classify human-written answers as AI-generated, as well as the relative ease with which AI-generated text can be edited to evade detection.”
The team assigned 10 assessment questions to students across 32 different courses at New York University’s Abu Dhabi campus (NYUAD). They also prompted ChatGPT to provide three sets of answers to the questions. Both student and bot answers were then given to a panel of three graders who weren’t given information about whether or not they were created by ChatGPT.
The chatbot performed similarly or better than the human students in nine courses—primarily in political science, engineering, and computer science. However, the students were able to consistently surpass ChatGPT in math and economics courses. This is likely due to the fact that LLMs focus primarily on language, as opposed to calculations and formal reasoning that would be required of those courses.
In surveys at NYUAD along with surveys in educational institutions in Brazil, India, Japan, the U.K. and the U.S., the team found that the majority of students intend to use ChatGPT to help complete homework assignments in the upcoming semester despite the ethical issues.
This problem is only exacerbated by the team’s other major finding: AI detectors like GPTZero and OpenAI’s AI classifier did a fairly bad job at spotting bot-written text. For example, GPTZero misclassified ChatGPT’s answers as being written by a human 32 percent of the time while OpenAI’s detector misclassified it 49 percent of the time. Moreover, OpenAI’s classifier misidentified 5 percent of human submissions as bot-generated, while GPTZero misidentified it 18 percent of the time.
“When comparing ChatGPT’s performance to that of students across university-level courses, the result indicates a clear need to take ‘AI-plagiarism’ seriously,” the authors wrote.
So school’s back—and with it is a technological reality that many schools are simply unprepared for. This new normal is going to challenge educators and students alike. Only time will tell if they pass the test—or receive a failing grade.