Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning
- URL: http://arxiv.org/abs/2602.11161v1
- Date: Mon, 29 Dec 2025 18:23:35 GMT
- Title: Althea: Human-AI Collaboration for Fact-Checking and Critical Reasoning
- Authors: Svetlana Churina, Kokil Jaidka, Anab Maulana Barik, Harshit Aneja, Cai Yang, Wynne Hsu, Mong Li Lee,
- Abstract summary: We introduce Althea, a retrieval-augmented system that integrates question generation, evidence retrieval, and structured reasoning to support user-driven evaluation of online claims.<n>On the AVeriTeC benchmark, Althea achieves a Macro-F1 of 0.44, outperforming standard verification pipelines and improving discrimination between supported and refuted claims.
- Score: 26.796186521236194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The web's information ecosystem demands fact-checking systems that are both scalable and epistemically trustworthy. Automated approaches offer efficiency but often lack transparency, while human verification remains slow and inconsistent. We introduce Althea, a retrieval-augmented system that integrates question generation, evidence retrieval, and structured reasoning to support user-driven evaluation of online claims. On the AVeriTeC benchmark, Althea achieves a Macro-F1 of 0.44, outperforming standard verification pipelines and improving discrimination between supported and refuted claims. We further evaluate Althea through a controlled user study and a longitudinal survey experiment (N = 642), comparing three interaction modes that vary in the degree of scaffolding: an Exploratory mode with guided reasoning, a Summary mode providing synthesized verdicts, and a Self-search mode that offers procedural guidance without algorithmic intervention. Results show that guided interaction produces the strongest immediate gains in accuracy and confidence, while self-directed search yields the most persistent improvements over time. This pattern suggests that performance gains are not driven solely by effort or exposure, but by how cognitive work is structured and internalized.
Related papers
- Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis [29.630872344186873]
Interactive medical consultation requires an agent to proactively elicit missing clinical evidence under uncertainty.<n>Existing evaluations largely remain static or outcome-centric, neglecting the evidence-gathering process.<n>We propose an interactive evaluation framework that explicitly models the consultation process using a simulated patient and a revsimulated reporter grounded in atomic evidences.
arXiv Detail & Related papers (2026-01-27T16:36:35Z) - From Transcripts to AI Agents: Knowledge Extraction, RAG Integration, and Robust Evaluation of Conversational AI Assistants [0.0]
Building reliable conversational AI assistants for customer-facing industries remains challenging due to noisy conversational data, fragmented knowledge, and the requirement for accurate human hand-off.<n>This paper presents an end-to-end framework for constructing and evaluating a conversational AI assistant directly from historical call transcripts.
arXiv Detail & Related papers (2026-01-26T07:44:47Z) - Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking [0.2864713389096699]
This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER)<n>CER restructures retrieval around factual evidence by fine-tuning embeddings with contrastive learning and generating token-level attribution rationales for each retrieved passage.<n>We evaluated our method on clinical trial reports, and initial experimental results show that CER improves retrieval accuracy, mitigates the potential for hallucinations in RAG systems, and provides transparent, evidence-based retrieval that enhances reliability, especially in safety-critical domains.
arXiv Detail & Related papers (2025-12-04T17:24:35Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - Towards Robust Fact-Checking: A Multi-Agent System with Advanced Evidence Retrieval [1.515687944002438]
The rapid spread of misinformation in the digital era poses significant challenges to public discourse.<n>Traditional human-led fact-checking methods, while credible, struggle with the volume and velocity of online content.<n>This paper proposes a novel multi-agent system for automated fact-checking that enhances accuracy, efficiency, and explainability.
arXiv Detail & Related papers (2025-06-22T02:39:27Z) - Active Test-time Vision-Language Navigation [60.69722522420299]
ATENA is a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes.<n>In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration.<n>In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions.
arXiv Detail & Related papers (2025-06-07T02:24:44Z) - Visual Agents as Fast and Slow Thinkers [88.1404921693082]
We introduce FaST, which incorporates the Fast and Slow Thinking mechanism into visual agents.<n>FaST employs a switch adapter to dynamically select between System 1/2 modes.<n>It tackles uncertain and unseen objects by adjusting model confidence and integrating new contextual data.
arXiv Detail & Related papers (2024-08-16T17:44:02Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.