RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media
- URL: http://arxiv.org/abs/2509.22055v1
- Date: Fri, 26 Sep 2025 08:36:45 GMT
- Title: RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media
- Authors: Yudong Li, Yufei Sun, Yuhan Yao, Peiru Yang, Wanyue Li, Jiajun Zou, Yongfeng Huang, Linlin Shen,
- Abstract summary: We introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis.<n>This dataset is sourced from Xiaohongshu platform, containing user engagement metrics and timestamps spanning from the pre-LLM period to July 2025.<n>To detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach.
- Score: 48.63633320837672
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The proliferation of Large Language Models (LLMs) has led to widespread AI-Generated Text (AIGT) on social media platforms, creating unique challenges where content dynamics are driven by user engagement and evolve over time. However, existing datasets mainly depict static AIGT detection. In this work, we introduce RedNote-Vibe, the first longitudinal (5-years) dataset for social media AIGT analysis. This dataset is sourced from Xiaohongshu platform, containing user engagement metrics (e.g., likes, comments) and timestamps spanning from the pre-LLM period to July 2025, which enables research into the temporal dynamics and user interaction patterns of AIGT. Furthermore, to detect AIGT in the context of social media, we propose PsychoLinguistic AIGT Detection Framework (PLAD), an interpretable approach that leverages psycholinguistic features. Our experiments show that PLAD achieves superior detection performance and provides insights into the signatures distinguishing human and AI-generated content. More importantly, it reveals the complex relationship between these linguistic features and social media engagement. The dataset is available at https://github.com/testuser03158/RedNote-Vibe.
Related papers
- ChatGpt Content detection: A new approach using xlm-roberta alignment [0.0]
We present a comprehensive methodology to detect AI-generated text using XLM-RoBERTa, a state-of-the-art multilingual transformer model.<n>We fine-tuned the model on a balanced dataset of human and AI-generated texts and evaluated its performance.<n>Our findings offer a valuable tool for maintaining academic integrity and contribute to the broader field of AI ethics.
arXiv Detail & Related papers (2025-11-26T03:16:57Z) - DTECT: Dynamic Topic Explorer & Context Tracker [0.8962460460173959]
We introduce DTECT (Dynamic Topic Explorer & Context Tracker), an end-to-end system that bridges the gap between raw textual data and meaningful temporal insights.<n>DTECT provides a unified workflow that supports data preprocessing, multiple model architectures, and dedicated evaluation metrics to analyze the topic quality of temporal topic models.<n>It significantly enhances interpretability by introducing LLM-driven automatic topic labeling, trend analysis via temporally salient words, interactive visualizations with document-level summarization, and a natural language chat interface for intuitive data querying.
arXiv Detail & Related papers (2025-07-10T16:44:33Z) - Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media [38.99664377299462]
Social media platforms are experiencing a growing presence of AI-Generated Texts (AIGTs)<n>Despite its importance, it remains unclear how prevalent AIGTs are on social media.<n>This paper aims to quantify and monitor the AIGTs on online social media platforms.
arXiv Detail & Related papers (2024-12-24T04:04:54Z) - CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System [4.612237040042468]
CityGPT employs three agents to accomplish thetemporal analysis of IoT data.
We have agnentized the framework, facilitated by a large language model (LLM), to increase the data comprehensibility.
Our evaluation results on real-world data with different time show that the CityGPT framework can guarantee robust performance in computing.
arXiv Detail & Related papers (2024-05-23T15:27:18Z) - An Attention-Based Denoising Framework for Personality Detection in Social Media Texts [1.6947975326871145]
Personality detection based on user-generated text is a method with broad application prospects.<n>We propose an attention-based information extraction mechanism (AIEM) for long texts.<n>We obtain an average accuracy improvement of 10.2% on the gold standard Twitter-Myers-Briggs Type Indicator dataset.
arXiv Detail & Related papers (2023-11-16T14:56:09Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - A Comprehensive Survey of AI-Generated Content (AIGC): A History of
Generative AI from GAN to ChatGPT [63.58711128819828]
ChatGPT and other Generative AI (GAI) techniques belong to the category of Artificial Intelligence Generated Content (AIGC)
The goal of AIGC is to make the content creation process more efficient and accessible, allowing for the production of high-quality content at a faster pace.
arXiv Detail & Related papers (2023-03-07T20:36:13Z) - Evidential Temporal-aware Graph-based Social Event Detection via
Dempster-Shafer Theory [76.4580340399321]
We propose ETGNN, a novel Evidential Temporal-aware Graph Neural Network.
We construct view-specific graphs whose nodes are the texts and edges are determined by several types of shared elements respectively.
Considering the view-specific uncertainty, the representations of all views are converted into mass functions through evidential deep learning (EDL) neural networks.
arXiv Detail & Related papers (2022-05-24T16:22:40Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.