A Tech Ethics Case Study
Anton Grabolle / Better Images of AI / Classification Cupboard / CC-BY 4.0
In January 2023, OpenAI released a tool designed to identify AI-written text. Earlier that month, a Princeton student had released his own app aiming to detect text produced by ChatGPT. Both efforts were, at least in part, a response to concerns expressed by many instructors that students were submitting assignments written by AI, claiming to have written them themselves. OpenAI’s announcement included a caveat:
In our evaluations on a ‘challenge set’ of English texts, our classifier correctly identifies 26% of AI-written text.. as ‘likely AI-written,’ while incorrectly labeling human-written text as AI-written 9% of the time…. We’re making this classifier publicly available to get feedback on whether imperfect tools like this one are useful.
In February, Turnitin announced that it had “developed an AI writing detector that, in its lab, identifies 97 percent of ChatGPT and GPT3 authored writing, with a very low less than 1/100 false positive rate.” The company’s Chief Product Officer noted that it was “essential that [the company’s] detector and any others limit false positives that may impact student engagement or motivation.” In April, Turnitin “made its AI detection feature available to 10,700 secondary and higher educational institutions.”
In May, Stanford researchers reported that AI-detection tools (seven available on the market by that point) were “especially unreliable when the real author (a human) is not a native English speaker.” The researchers cautioned that their analysis raised “serious questions about the objectivity of AI detectors and… the potential that foreign-born students and workers might be unfairly accused of or, worse, penalized for cheating.”
In July, an article titled “Tools to Detect AI-Generated Content Just Don’t Work” reported on an academic study that reviewed 14 AI-detection tools by then available for use. The researchers, who built on previously published papers, wrote that the tools (which included Turnitin’s) were “neither accurate nor reliable (all scored below 80 percent of accuracy and only 5 over 70 percent… ) In general, they have been found to diagnose human-written documents as AI-generated (false positives) and often diagnose AI-generated texts as human-written (false negatives).”
Also in July, OpenAI updated its blog post to announce that its tool was “no longer available due to its low rate of accuracy.”
Some students have spoken publicly about being wrongly flagged for plagiarism by some of these assessment tools. Turnitin’s AI-detection feature remains available. The company notes on its website that its “model may not always be accurate (it may misidentify both human and AI-generated text) so it should not be used as the sole basis for adverse actions against a student.”
Discussion Questions:
- Who are the stakeholders involved? Who should be consulted in the process of developing and deploying like AI-writing detectors in the education context?
- How might the deployment of such tools be evaluated through the ethical 'lenses' of rights, justice, utilitarianism, the common good, virtue ethics, and care ethics? See “A Framework for Ethical Decision Making”.
- What ethical challenges might arise in the interactions among students, instructors, and educational administrators, as they all navigate the use of AI detectors? How might those be addressed?