Related papers: COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations

COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations

URL: http://arxiv.org/abs/2510.24810v1
Date: Tue, 28 Oct 2025 05:28:47 GMT
Title: COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations
Authors: Rui Xing, Preslav Nakov, Timothy Baldwin, Jey Han Lau,
Abstract summary: We present a large-scale dataset of 104k posts with user-provided notes and helpfulness labels.<n>We propose a framework that automatically generates and improves reason definitions via automatic prompt optimization.<n>Our experiments show that the optimized definitions can improve both helpfulness and reason prediction.
Score: 89.37527535663433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fact-checking on major platforms, such as X, Meta, and TikTok, is shifting from expert-driven verification to a community-based setup, where users contribute explanatory notes to clarify why a post might be misleading. An important challenge here is determining whether an explanation is helpful for understanding real-world claims and the reasons why, which remains largely underexplored in prior research. In practice, most community notes remain unpublished due to slow community annotation, and the reasons for helpfulness lack clear definitions. To bridge these gaps, we introduce the task of predicting both the helpfulness of explanatory notes and the reason for this. We present COMMUNITYNOTES, a large-scale multilingual dataset of 104k posts with user-provided notes and helpfulness labels. We further propose a framework that automatically generates and improves reason definitions via automatic prompt optimization, and integrate them into prediction. Our experiments show that the optimized definitions can improve both helpfulness and reason prediction. Finally, we show that the helpfulness information are beneficial for existing fact-checking systems.

Related papers

GRainsaCK: a Comprehensive Software Library for Benchmarking Explanations of Link Prediction Tasks on Knowledge Graphs [0.0]
Explanation methods tackle this issue by identifying supporting knowledge explaining the predicted facts.<n>We propose GRainsaCK, a reusable software resource that fully streamlines all the tasks involved in benchmarking explanations.<n>GRainsaCK furthers modularity/extensibility by implementing the main components as functions that can be easily replaced.
arXiv Detail & Related papers (2025-08-12T10:15:58Z)
FIRE: Faithful Interpretable Recommendation Explanations [2.6499018693213316]
Natural language explanations in recommender systems are often framed as a review generation task.<n>Fire is a lightweight and interpretable framework that combines SHAP-based feature attribution with structured, prompt-driven language generation.<n>Our results demonstrate that FIRE not only achieves competitive recommendation accuracy but also significantly improves explanation quality along critical dimensions such as alignment, structure, and faithfulness.
arXiv Detail & Related papers (2025-08-07T10:11:02Z)
ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs [13.608282497568108]
ClaimVer is a human-centric framework tailored to meet users' informational and verification needs. It highlights each claim, verifies it against a trusted knowledge graph, and provides succinct, clear explanations for each claim prediction.
arXiv Detail & Related papers (2024-03-12T17:07:53Z)
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness [67.49087159888298]
ReCEval is a framework that evaluates reasoning chains via two key properties: correctness and informativeness. We show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods.
arXiv Detail & Related papers (2023-04-21T02:19:06Z)
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z)
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning [63.77667876176978]
Large language models show improved downstream task interpretability when prompted to generate step-by-step reasoning to justify their final answers. These reasoning steps greatly improve model interpretability and verification, but objectively studying their correctness is difficult. We present ROS, a suite of interpretable, unsupervised automatic scores that improve and extend previous text generation evaluation metrics.
arXiv Detail & Related papers (2022-12-15T15:52:39Z)
The Unreliability of Explanations in Few-Shot In-Context Learning [50.77996380021221]
We focus on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We show that explanations judged as good by humans--those that are logically consistent with the input--usually indicate more accurate predictions. We present a framework for calibrating model predictions based on the reliability of the explanations.
arXiv Detail & Related papers (2022-05-06T17:57:58Z)
Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing [22.5444107755288]
We present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of ruling comments. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.
arXiv Detail & Related papers (2021-12-13T15:31:07Z)
Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA [22.76153284711981]
We study whether explanations help users correctly decide when to accept or reject an ODQA system's answer. Our results show that explanations derived from retrieved evidence passages can outperform strong baselines (calibrated confidence) across modalities. We show common failure cases of current explanations, emphasize end-to-end evaluation of explanations, and caution against evaluating them in proxy modalities that are different from deployment.
arXiv Detail & Related papers (2020-12-30T08:19:02Z)
Generating Fact Checking Explanations [52.879658637466605]
A crucial piece of the puzzle that is still missing is to understand how to automate the most elaborate part of the process. This paper provides the first study of how these explanations can be generated automatically based on available claim context. Our results indicate that optimising both objectives at the same time, rather than training them separately, improves the performance of a fact checking system.
arXiv Detail & Related papers (2020-04-13T05:23:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.