How Many Academic Papers are Written with the Help of ChatGPT? This AI Paper Delves into ChatGPT Usage in Academic Writing through Excess Vocabulary

There has been a rapid increase in the use of large language models (LLMs), such as ChatGPT, in academic writing. This study investigates how prevalent these AI tools are in scholarly literature, particularly focusing on detecting changes in writing style and vocabulary in biomedical research abstracts from PubMed between 2010 and 2024. The widespread availability of LLMs has led to concerns about the authenticity and originality of scientific texts, with implications for research integrity and the evaluation of academic contributions.Â

Traditionally, attempts to quantify the presence of LLM-generated text in academic literature have relied on several methods. One common approach involves using LLM detectors, trained to distinguish between human and AI-generated text based on known samples. Another method models word frequency distributions in scientific texts, treating them as mixtures of human and AI-generated content. A third strategy employs lists of marker words overused by LLMs, typically stylistic terms rather than content-specific vocabulary.

A novel, data-driven approach is proposed that avoids some limitations of previous methods. Instead of relying on predefined datasets of human and LLM-generated texts, their method examines excess word usage to identify LLM involvement. Inspired by studies of excess mortality during the COVID-19 pandemic, this technique tracks the frequency of certain words that show a significant increase post-ChatGPT release compared to their expected usage based on trends from earlier years. This method allows for a more unbiased and comprehensive analysis of LLMâ€™s impact on scientific writing.

The researchers analyzed over 14 million PubMed abstracts from 2010 to 2024. They created a matrix of word occurrences across these abstracts and calculated the annual frequency of each word. By comparing the observed frequencies in 2023 and 2024 to counterfactual projections based on trends from 2021 and 2022, they identified words with significant increases in usage. These words, termed â€œexcess words,â€ were then used to gauge the influence of LLMs.

The analysis revealed that certain words, especially stylistic ones like â€œdelves,â€ â€œshowcasing,â€ and â€œunderscores,â€ showed marked increases in frequency, suggesting LLM involvement. The researchers quantified this excess usage with two measures: the excess frequency gap (the difference between observed and expected frequencies) and the excess frequency ratio (the ratio of observed to expected frequencies). They found a substantial rise in the number of excess words in 2024, coinciding with the widespread availability of ChatGPT. This increase was unprecedented, surpassing the vocabulary changes observed during the COVID-19 pandemic.

To estimate the extent of LLM usage, the researchers used the frequency gap of excess words as a lower bound. For example, the word â€œpotentialâ€ showed an excess frequency gap, indicating that at least 4% of 2024 abstracts included this word due to LLM influence. By analyzing abstracts containing words with excess usage, the authors obtained a lower bound of 10% for LLM-assisted papers in 2024. This approach provided a robust lower bound, acknowledging that the actual figure could be higher due to some LLM-processed abstracts not containing any tracked excess words. This estimate differed across disciplines (e.g., 20% in computation, 6% in Nature/Science/Cell), countries (e.g., 16% in China vs 3% in the UK), and journals (e.g., 24% in Sensors, 17% in Frontiers/MDPI). The highest estimate was 35% for computation papers from China.

The research highlights a significant shift in academic writing styles due to the advent of LLMs like ChatGPT. By developing a novel methodology to track excess word usage, the study provides compelling evidence that LLMs have had a notable impact on scientific literature, with at least 10% of recent biomedical abstracts showing signs of AI assistance. This underscores the transformative effect of LLMs on scholarly communication and raises important questions about research integrity and the future of academic writing.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â

Join ourÂ Telegram Channel andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 45k+ ML SubReddit

Create, edit, and augment tabular data with the first compound AI system, Gretel Navigator, now generallyÂ available! [Advertisement]

The post How Many Academic Papers are Written with the Help of ChatGPT? This AI Paper Delves into ChatGPT Usage in Academic Writing through Excess Vocabulary appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

Microsoft plans to lay off 3% of its workforce, reportedly targeting management cuts as it changes to fit a “dynamic marketplace”

A cross-platform Markdown note-taking application

A cross-platform Markdown note-taking application

AI Assistant Demo & Tips for Enterprise Projects

Celebrating Global Accessibility Awareness Day (GAAD)

Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

Intel’s latest Arc graphics driver is ready for DOOM: The Dark Ages, launching for Premium Edition owners on PC today

NVIDIA’s drivers are causing big problems for DOOM: The Dark Ages, but some fixes are available

Capcom breaks all-time profit records with 10% income growth after Monster Hunter Wilds sold over 10 million copies in a month

How Many Academic Papers are Written with the Help of ChatGPT? This AI Paper Delves into ChatGPT Usage in Academic Writing through Excess Vocabulary

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

CVE-2025-4743 – Code-projects Employee Record System SQL Injection Vulnerability

Firefox Nightly: arriva il gestore profili

CVE-2025-45618 – Jeeweb Mybatis Springboot Unauthenticated Information Disclosure

Perficient is headed to Data Cloud Summit

Experts Uncover Chinese Cybercrime Network Behind Gambling and Human Trafficking

Highlights from Git 2.48

Provable Uncertainty Decomposition via Higher-Order Calibration

If Intel can’t come up with a Qualcomm-killer soon, it’s game over for x86 PCs

ProVision: A Scalable Programmatic Approach to Vision-Centric Instruction Data for Multimodal Language Models

How Many Academic Papers are Written with the Help of ChatGPT? This AI Paper Delves into ChatGPT Usage in Academic Writing through Excess Vocabulary

Related Posts