Related papers: CLMIR: A Textual Dataset for Rumor Identification and Marking

CLMIR: A Textual Dataset for Rumor Identification and Marking

URL: http://arxiv.org/abs/2508.11138v1
Date: Fri, 15 Aug 2025 01:09:27 GMT
Title: CLMIR: A Textual Dataset for Rumor Identification and Marking
Authors: Bin Ma, Yifei Zhang, Yongjin Xian, Qi Li, Linna Zhou, Gongxun Miao,
Abstract summary: This paper constructs a dataset for rumor detection with fine-grained markings, named CLMIR.<n>In addition to determining whether a post is a rumor, this dataset further marks the specific content upon which the rumor is based.
Score: 15.703292627605304
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rise of social media, rumor detection has drawn increasing attention. Although numerous methods have been proposed with the development of rumor classification datasets, they focus on identifying whether a post is a rumor, lacking the ability to mark the specific rumor content. This limitation largely stems from the lack of fine-grained marks in existing datasets. Constructing a rumor dataset with rumor content information marking is of great importance for fine-grained rumor identification. Such a dataset can facilitate practical applications, including rumor tracing, content moderation, and emergency response. Beyond being utilized for overall performance evaluation, this dataset enables the training of rumor detection algorithms to learn content marking, and thus improves their interpretability and reasoning ability, enabling systems to effectively address specific rumor segments. This paper constructs a dataset for rumor detection with fine-grained markings, named CLMIR (Content-Level Marking Dataset for Identifying Rumors). In addition to determining whether a post is a rumor, this dataset further marks the specific content upon which the rumor is based.

Related papers

LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data [69.5099112089508]
Large language models (LLMs) exhibit remarkable generative capabilities but raise ethical and security concerns by memorizing sensitive data.<n>This work presents the first study of unlearning under perturbed or low-fidelity forget data, referred to as noisy forget sets.<n>We find that unlearning remains surprisingly robust to perturbations, provided that core semantic signals are preserved.
arXiv Detail & Related papers (2025-10-10T05:10:49Z)
Insight Rumors: A Novel Textual Rumor Locating and Marking Model Leveraging Att_BiMamba2 Network [15.703292627605304]
This paper proposes a novel rumor detection model named Insight Rumors to locate and mark rumor content within textual data.<n>The proposed scheme not only detects rumors accurately but also locates and marks them in context precisely, outperforming state-of-the-art schemes that can only discriminate rumors roughly.
arXiv Detail & Related papers (2025-08-18T02:20:57Z)
Towards Real-World Rumor Detection: Anomaly Detection Framework with Graph Supervised Contrastive Learning [3.2803526084968904]
We construct two large-scale conversation datasets from Weibo and Twitter.<n>We find obvious differences between rumor and non-rumor distributions.<n>We propose the Anomaly Detection framework Graph Supervised Contrastive Learning.
arXiv Detail & Related papers (2025-08-10T06:59:33Z)
Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels. The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation. Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z)
Detecting Rumor Veracity with Only Textual Information by Double-Channel Structure [7.931904787652709]
We propose a double-channel structure to determine the ex-ante veracity of rumors on social media. We first assign each text into either certain (informed rumor) or uncertain (uninformed rumor) category. Then, we apply lie detection algorithm to informed rumors and thread-reply agreement detection algorithm to uninformed rumors.
arXiv Detail & Related papers (2023-12-06T00:08:44Z)
PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions [119.17602768128806]
It is hard to manually label edges accurately, especially for large datasets. This paper proposes to learn Pixel-level NoiseTransitions to model the label-corruption process.
arXiv Detail & Related papers (2023-07-26T09:45:17Z)
FedNoisy: Federated Noisy Label Learning Benchmark [53.73816587601204]
Federated learning has gained popularity for distributed learning without aggregating sensitive data from clients.<n>The distributed and isolated nature of data isolation may be complicated by data quality, making it more vulnerable to noisy labels.<n>We serve the first standardized benchmark that can help researchers fully explore potential federated noisy settings.
arXiv Detail & Related papers (2023-06-20T16:18:14Z)
Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks [28.550143417847373]
Open-source benchmark datasets suffer from spurious correlations, which are ignored by existing studies. We propose event-separated rumor detection as a solution to eliminate spurious cues. Our method outperforms existing baselines in terms of effectiveness, efficiency and generalizability.
arXiv Detail & Related papers (2022-09-19T07:11:36Z)
Rumor Detection with Self-supervised Learning on Texts and Social Graph [101.94546286960642]
We propose contrastive self-supervised learning on heterogeneous information sources, so as to reveal their relations and characterize rumors better. We term this framework as Self-supervised Rumor Detection (SRD) Extensive experiments on three real-world datasets validate the effectiveness of SRD for automatic rumor detection on social media.
arXiv Detail & Related papers (2022-04-19T12:10:03Z)
Audio Tagging by Cross Filtering Noisy Labels [26.14064793686316]
We present a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging. Our method achieves state-of-the-art performance and even surpasses the ensemble models.
arXiv Detail & Related papers (2020-07-16T07:55:04Z)
Fine-Tune Longformer for Jointly Predicting Rumor Stance and Veracity [27.661609140918916]
We propose a multi-task learning framework for jointly predicting rumor stance and veracity. Our framework consists of two parts: a) The bottom part of our framework classifies the stance for each post in the conversation thread discussing a rumor via modelling the multi-turn conversation and make each post aware of its neighboring posts. Experimental results on SemEval 2019 Task 7 dataset show that our method outperforms previous methods on both rumor stance classification and veracity prediction.
arXiv Detail & Related papers (2020-07-15T17:09:17Z)
DenoiSeg: Joint Denoising and Segmentation [75.91760529986958]
We propose DenoiSeg, a new method that can be trained end-to-end on only a few annotated ground truth segmentations. We achieve this by extending Noise2Void, a self-supervised denoising scheme that can be trained on noisy images alone, to also predict dense 3-class segmentations.
arXiv Detail & Related papers (2020-05-06T17:42:54Z)
Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks [89.13567439679709]
We propose a novel bi-directional graph model, named Bi-Directional Graph Convolutional Networks (Bi-GCN), to explore both characteristics by operating on both top-down and bottom-up propagation of rumors. It leverages a GCN with a top-down directed graph of rumor spreading to learn the patterns of rumor propagation, and a GCN with an opposite directed graph of rumor diffusion to capture the structures of rumor dispersion.
arXiv Detail & Related papers (2020-01-17T15:12:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.