SEVADE: Self-Evolving Multi-Agent Analysis with Decoupled Evaluation for Hallucination-Resistant Irony Detection
- URL: http://arxiv.org/abs/2508.06803v1
- Date: Sat, 09 Aug 2025 03:25:45 GMT
- Title: SEVADE: Self-Evolving Multi-Agent Analysis with Decoupled Evaluation for Hallucination-Resistant Irony Detection
- Authors: Ziqi Liu, Yangbin Chen, Ziyang Zhou, Yilin Li, Mingxuan Hu, Yushan Pan, Zhijie Xu,
- Abstract summary: We propose a novel **Self-**Ev**olving multi-agent **A**nalysis framework with **D**ecoupled **E**valuation for hallucination-resistant sarcasm detection.<n>Our framework achieves state-of-the-art performance, with average improvements of **6.75%** in Accuracy and **6.29%** in Macro-F1 score.
- Score: 11.652782877761446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sarcasm detection is a crucial yet challenging Natural Language Processing task. Existing Large Language Model methods are often limited by single-perspective analysis, static reasoning pathways, and a susceptibility to hallucination when processing complex ironic rhetoric, which impacts their accuracy and reliability. To address these challenges, we propose **SEVADE**, a novel **S**elf-**Ev**olving multi-agent **A**nalysis framework with **D**ecoupled **E**valuation for hallucination-resistant sarcasm detection. The core of our framework is a Dynamic Agentive Reasoning Engine (DARE), which utilizes a team of specialized agents grounded in linguistic theory to perform a multifaceted deconstruction of the text and generate a structured reasoning chain. Subsequently, a separate lightweight rationale adjudicator (RA) performs the final classification based solely on this reasoning chain. This decoupled architecture is designed to mitigate the risk of hallucination by separating complex reasoning from the final judgment. Extensive experiments on four benchmark datasets demonstrate that our framework achieves state-of-the-art performance, with average improvements of **6.75%** in Accuracy and **6.29%** in Macro-F1 score.
Related papers
- Unlocking Cognitive Capabilities and Analyzing the Perception-Logic Trade-off [29.48293757752123]
We present a progressive training pipeline that integrates Perception andReasoning capabilities.<n>We identify Temporal Drift in long-context audio, where extended reasoning desynchronizes the model from acoustic timestamps.<n>This report details the architecture, the data-efficient training recipe, and a diagnostic analysis of the trade-offs between robust perception and structured reasoning.
arXiv Detail & Related papers (2026-02-27T06:56:50Z) - RAM-SD: Retrieval-Augmented Multi-agent framework for Sarcasm Detection [17.814793753195723]
RAM-SD is a Retrieval-Augmented Multi-Agent framework for Sarcasm Detection.<n>It operates through four stages: (1) contextual retrieval grounds the query in both sarcastic and non-sarcastic exemplars; (2) a meta-planner classifies the sarcasm type and selects an optimal reasoning plan from a predefined set; and (3) an ensemble of specialized agents performs complementary, multi-view analysis.<n> evaluated on four standard benchmarks, RAM-SD achieves a state-of-the-art Macro-F1 of 77.74%, outperforming the strong GPT-4o+CoC baseline by 7.01
arXiv Detail & Related papers (2026-01-14T03:19:40Z) - Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation [17.405818788700234]
We present a Collaborative Multi-Agent Reasoning Framework that explicitly decouples Semantic Planning from Visual Synthesis.<n>Our method generates a structured, explicit plan before pixel generation, enabling visually and semantically coherent single-pass synthesis.<n>Addressing the limitations of traditional metrics in assessing inferred invisible content, we introduce the MAC-Score, a novel human-aligned evaluation metric.
arXiv Detail & Related papers (2025-12-24T04:39:45Z) - Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images [96.43608872116347]
AnomReason is a large-scale benchmark with structured annotations as quadruple textbfAnomAgent<n>AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images.
arXiv Detail & Related papers (2025-10-11T14:09:24Z) - Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection [16.113113157328662]
Existing zero-shot detection paradigms often exhibit significant deficiencies.<n>We introduce textbfCAMF, a novel architecture using multiple LLM-based agents.<n>This structured collaborative-adversarial process enables a deep analysis of subtle, cross-dimensional textual incongruities indicative of non-human origin.
arXiv Detail & Related papers (2025-08-16T06:25:27Z) - Are All Prompt Components Value-Neutral? Understanding the Heterogeneous Adversarial Robustness of Dissected Prompt in Large Language Models [11.625319498017733]
We introduce PromptAnatomy, an automated framework that dissects prompts into functional components.<n>We generate adversarial examples by selectively perturbing each component using our proposed method, ComPerturb.<n>As a complementary resource, we annotate four public instruction-tuning datasets using the PromptAnatomy framework.
arXiv Detail & Related papers (2025-08-03T02:46:30Z) - Zero-Shot Event Causality Identification via Multi-source Evidence Fuzzy Aggregation with Large Language Models [11.541829239773643]
Event Causality Identification (ECI) aims to detect causal relationships between events in textual contexts.<n>Existing ECI models predominantly rely on supervised methodologies, suffering from dependence on large-scale annotated data.<n>We propose MEFA, a novel zero-shot framework based on Multi-source Evidence Fuzzy Aggregation.
arXiv Detail & Related papers (2025-06-06T01:56:05Z) - CLATTER: Comprehensive Entailment Reasoning for Hallucination Detection [60.98964268961243]
We propose that guiding models to perform a systematic and comprehensive reasoning process allows models to execute much finer-grained and accurate entailment decisions.<n>We define a 3-step reasoning process, consisting of (i) claim decomposition, (ii) sub-claim attribution and entailment classification, and (iii) aggregated classification, showing that such guided reasoning indeed yields improved hallucination detection.
arXiv Detail & Related papers (2025-06-05T17:02:52Z) - Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models [12.270274049887298]
Reasoning traces can be redundant or logically inconsistent, making them a new source of hallucination.<n>Existing hallucination detection methods focus primarily on answer-level uncertainty.<n>We propose RACE, a novel framework specifically tailored for hallucination detection in LRMs.
arXiv Detail & Related papers (2025-06-05T09:54:04Z) - Towards Long Context Hallucination Detection [49.195854802543714]
Large Language Models (LLMs) have demonstrated remarkable performance across various tasks.<n>They are prone to contextual hallucination, generating information that is either unsubstantiated or contradictory to the given context.<n>We propose a novel architecture that enables pre-trained encoder models, such as BERT, to process long contexts and effectively detect contextual hallucinations.
arXiv Detail & Related papers (2025-04-28T03:47:05Z) - Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment.
We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z) - Sentiment Analysis through LLM Negotiations [58.67939611291001]
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round.
This paper introduces a multi-LLM negotiation framework for sentiment analysis.
arXiv Detail & Related papers (2023-11-03T12:35:29Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.