Machines in the Crowd? Measuring the Footprint of Machine-Generated Text on Reddit
- URL: http://arxiv.org/abs/2510.07226v1
- Date: Wed, 08 Oct 2025 16:57:45 GMT
- Title: Machines in the Crowd? Measuring the Footprint of Machine-Generated Text on Reddit
- Authors: Lucio La Cava, Luca Maria Aiello, Andrea Tagarelli,
- Abstract summary: We present the first large-scale characterization of Machine-Generated Text (MGT) on Reddit.<n>Using a state-of-the-art statistical method for detection of MGT, we analyze over two years of activity (2022-2024) across 51 subreddits.<n>Our very conservative estimate of MGT prevalence indicates that synthetic text is marginally present on Reddit, but it can reach peaks of up to 9% in some communities.
- Score: 8.318350327150437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Artificial Intelligence is reshaping online communication by enabling large-scale production of Machine-Generated Text (MGT) at low cost. While its presence is rapidly growing across the Web, little is known about how MGT integrates into social media environments. In this paper, we present the first large-scale characterization of MGT on Reddit. Using a state-of-the-art statistical method for detection of MGT, we analyze over two years of activity (2022-2024) across 51 subreddits representative of Reddit's main community types such as information seeking, social support, and discussion. We study the concentration of MGT across communities and over time, and compared MGT to human-authored text in terms of social signals it expresses and engagement it receives. Our very conservative estimate of MGT prevalence indicates that synthetic text is marginally present on Reddit, but it can reach peaks of up to 9% in some communities in some months. MGT is unevenly distributed across communities, more prevalent in subreddits focused on technical knowledge and social support, and often concentrated in the activity of a small fraction of users. MGT also conveys distinct social signals of warmth and status giving typical of language of AI assistants. Despite these stylistic differences, MGT achieves engagement levels comparable than human-authored content and in a few cases even higher, suggesting that AI-generated text is becoming an organic component of online social discourse. This work offers the first perspective on the MGT footprint on Reddit, paving the way for new investigations involving platform governance, detection strategies, and community dynamics.
Related papers
- SoMe: A Realistic Benchmark for LLM-based Social Media Agents [64.05026384906915]
SoMe is a benchmark designed to evaluate social media agents equipped with various agent tools for accessing and analyzing social media data.<n>SoMe comprises a diverse collection of 8 social media agent tasks, 9,164,284 posts, 6,591 user profiles, and 25,686 reports from various social media platforms and external websites.<n>By extensive quantitative and qualitative analysis, we provide the first overview into the performance of mainstream agentic LLMs in realistic social media environments.
arXiv Detail & Related papers (2025-12-09T08:36:09Z) - RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media [48.63633320837672]
We introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis.<n>This dataset is sourced from Xiaohongshu platform, containing user engagement metrics and timestamps spanning from the pre-LLM period to July 2025.<n>To detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach.
arXiv Detail & Related papers (2025-09-26T08:36:45Z) - Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media [38.99664377299462]
Social media platforms are experiencing a growing presence of AI-Generated Texts (AIGTs)<n>Despite its importance, it remains unclear how prevalent AIGTs are on social media.<n>This paper aims to quantify and monitor the AIGTs on online social media platforms.
arXiv Detail & Related papers (2024-12-24T04:04:54Z) - LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection [87.43727192273772]
It is often hard to tell whether a piece of text was human-written or machine-generated.<n>We present LLM-DetectAIve, designed for fine-grained detection.<n>It supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished.
arXiv Detail & Related papers (2024-08-08T07:43:17Z) - M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection [69.41274756177336]
Large Language Models (LLMs) have brought an unprecedented surge in machine-generated text (MGT) across diverse channels.
This raises legitimate concerns about its potential misuse and societal implications.
We introduce a new benchmark based on a multilingual, multi-domain, and multi-generator corpus of MGTs -- M4GT-Bench.
arXiv Detail & Related papers (2024-02-17T02:50:33Z) - Anti-Sexism Alert System: Identification of Sexist Comments on Social
Media Using AI Techniques [0.0]
Sexist comments that are publicly posted in social media (newspaper comments, social networks, etc.) usually obtain a lot of attention and become viral, with consequent damage to the persons involved.
In this paper, we introduce an anti-sexism alert system, based on natural language processing (NLP) and artificial intelligence (AI)
This system analyzes any public post, and decides if it could be considered a sexist comment or not.
arXiv Detail & Related papers (2023-11-28T19:48:46Z) - GPT-4V(ision) as A Social Media Analysis Engine [77.23394183063238]
This paper explores GPT-4V's capabilities for social multimedia analysis.
We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection.
GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge.
arXiv Detail & Related papers (2023-11-13T18:36:50Z) - Gender Gaps in Online Social Connectivity, Promotion and Relocation
Reports on LinkedIn [0.7373617024876725]
This paper analyses anonymised data from almost 10 million LinkedIn users in the UK and US information technology (IT) sector.
We find there are fewer women compared to men on LinkedIn in IT.
Women are more likely than men to have reported a recent promotion at work, suggesting high-achieving women may be self-selecting onto LinkedIn.
arXiv Detail & Related papers (2023-08-25T10:43:30Z) - MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs)
We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples.
Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z) - ChatGPT: A Meta-Analysis after 2.5 Months [16.62394237011141]
We analyze over 300,000 tweets and more than 150 scientific papers to investigate how ChatGPT is perceived and discussed.
Our findings show that ChatGPT is generally viewed as of high quality, with positive sentiment and emotions of joy dominating in social media.
arXiv Detail & Related papers (2023-02-20T15:43:22Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.