arXiv's Crackdown on AI Hallucinations: The Cost of Academic Slop

The rise of Large Language Models (LLMs) has fundamentally altered the speed of content production, but it has also introduced a pervasive problem into the scientific record: "slop." In academic publishing, this manifests most dangerously as hallucinated references—citations to papers that do not exist, generated by AI to provide a veneer of legitimacy to a claim.

In a recent announcement via X, Thomas G. Dietterich highlighted a critical reinforcement of arXiv's Code of Conduct. The core message is clear: authors are fully responsible for every word and reference in their submissions, regardless of whether the content was generated by a human or an AI. To enforce this, arXiv is implementing a severe penalty for those who fail this responsibility.

The Penalty: A One-Year Ban

According to community discussions and reports, the penalty for submitting papers with hallucinated references is a one-year ban from arXiv. Furthermore, once the ban expires, any subsequent submissions from the offending authors must first be accepted at a reputable peer-reviewed venue before they can be uploaded to the preprint server.

This move signals a shift from viewing AI-generated errors as simple mistakes to treating them as a breach of academic integrity. As one commenter noted, "arXiv is free, but it's a privilege not a right."

The Crisis of "Academic Slop"

The reaction from the technical community has been largely supportive, reflecting a growing frustration with the degradation of scholarly literature. The proliferation of low-effort, AI-generated papers has placed an undue burden on the volunteer peer-review system.

"Reviewers are experts volunteers who do it for free. It is incredibly frustrating to have spent 4 hours reading a paper... just to realize that it is hallucinations. The authors should value the time of the reviewers higher than their own time."

Beyond the waste of time, there is a deeper concern regarding the pollution of the scientific record. The ease with which LLMs can generate plausible-sounding but entirely fabricated bibliographies means that the "slop" can easily slip through if authors do not manually verify every entry.

Implementation Challenges and Counterpoints

While the policy is welcomed in principle, several technical and ethical questions remain regarding its enforcement:

1. Detection at Scale

How will arXiv detect these hallucinations? While some suggest automated DOI verification, others point out that the process of identifying a hallucination can itself be prone to error. However, others argue that checking references is one of the easiest parts of a paper to automate, making the lack of verification an inexcusable failure on the part of the author.

2. The "Reputable Venue" Ambiguity

Critics have pointed out that the requirement for subsequent papers to be accepted at a "reputable peer-reviewed venue" is vaguely defined. Without clear criteria for what constitutes "reputable," the policy could be applied inconsistently.

3. The BibTeX Struggle

Some researchers argue that the drive toward AI-generated citations stems from the genuine difficulty of managing consistent citation data. Discrepancies between arXiv versions and conference versions of the same paper often lead authors to take shortcuts with AI tools to generate clean BibTeX entries, which then introduces the risk of hallucination.

4. Collective Responsibility

There is a concern regarding multi-author papers. If one author uses an LLM to generate a bibliography and submits the paper without the others' knowledge, does the entire author list face a one-year ban? The policy's stance that each author takes full responsibility suggests the answer is yes, emphasizing the need for rigorous internal vetting.

Broader Implications for AI in Science

This policy is part of a larger conversation about the role of AI in research. While the ban targets fabrication, it opens the door to discussions about other requirements, such as mandatory reproducibility for AI-driven papers. As one community member suggested, the next step should be ensuring that code and configurations are not "fudged" to make results appear better than they are.

Ultimately, the arXiv policy serves as a stark reminder: while AI can be a powerful tool for drafting and organizing, the responsibility for truth remains human. In the eyes of the scientific community, a hallucinated reference isn't just a technical glitch—it is a fabrication.

arXiv's Crackdown on AI Hallucinations: The Cost of Academic Slop

arXiv's Crackdown on AI Hallucinations: The Cost of Academic Slop

The Penalty: A One-Year Ban

The Crisis of "Academic Slop"

Implementation Challenges and Counterpoints

1. Detection at Scale

2. The "Reputable Venue" Ambiguity

3. The BibTeX Struggle

4. Collective Responsibility

Broader Implications for AI in Science

References

HN Stories