Is it possible to detect Gen AI text?

Jul 21, 2024

Experiments show that people are close to random guesses when asked to identify AI-generated content produced by modern LLMs. But is it possible to create a tool to identify AI-generated content? If you don't have time, the short answer is no; if you have some time, the answer below is a bit longer.

In Jan 2023, OpenAI released a tool to classify AI-written text. In July 2023, OpenAI shut down this tool. The main reason was poor performance; only 26% of the text was classified correctly. At the same time, 9% of texts were classified as AI-generated but originally were written by humans. If you want to create such a tool, you should also think about how to cover potential reputational damages. From my observation in 2023, I've seen many more papers on this subject than in the last 6 months. It looks like the research community's focus has switched to other areas.

There are four primary methods to identify AI-generate text - neural networks, zero-shot AI, watermarks, and information retrieval. Some papers show how to create a neural network to identify AI text. The problem with this approach is that to work properly, you need access to the model's internal data. Zero-shot AI is based on the idea that LLM can identify AI texts. Watermarks is an interesting approach. We can embed some information in the output, marking the text as AI-generated. The problem is that model providers need to support this, but it also restricts the result and leads to reduced capabilities. Also, we can ask LLM to rephrase text to get rid of watermarks. The best method is informational retrieval. To implement this, one must store generated content and compare it to the text in question. This method leads to privacy concerns.

The Raidar paper has proposed a different approach. The paper is based on one observation - LLMs tend to modify human-written text more than AI-generated. In other words, if you ask LLM to rewrite a text, human-written text will be changed drastically compared to AI-generated. Perhaps LLMs think that AI-generated text is better by default. The paper uses three simple prompts - 'Help me polish this:', 'Rewrite this for me:' and 'Refine this for me please:'. The results are measured with the help of Bag-of-words and Levenshtein score. If scores are low - text is likely to be AI-generated.

There is another approach from the 'Monitoring AI-Modified Content at Scale' paper, which also shared that 'Roughly 7-15% of sentences in ML conference reviews were substantially modified by AI'. The paper calculates the probability of adjectives in texts. Take, for instance, 'commendable'. On a big corpus of text written by humans probability of this adjective will be X, in contrast in AI-generated texts the probability will be Y. Comparing X to Y we can tell weather the text is AI-generated or not. From the paper: 'adjectives such as "commendable", "meticulous", and "intricate" show 9.8, 34.7, and 11.2-fold increases in the probability of occurring in a sentence.'. An interesting discovery of this paper is the deadline effect - 'Estimated ChatGPT usage in reviews spikes significantly within 3 days of review deadlines'.

References:

https://arxiv.org/abs/2401.12970 - Raidar: Generative AI Detection via Rewriting
https://arxiv.org/abs/2403.07183 - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
https://arxiv.org/abs/2303.11156 - Can AI-Generated Text be Reliably Detected?
https://openai.com/index/new-ai-classifier-for-indicating-ai-written-text/ - New AI classifier for indicating AI-written text

Shchegrikovich LLM

Discussion about this post