Related papers: From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning

From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning

URL: http://arxiv.org/abs/2509.24765v2
Date: Tue, 30 Sep 2025 03:40:31 GMT
Title: From Ambiguity to Verdict: A Semiotic-Grounded Multi-Perspective Agent for LLM Logical Reasoning
Authors: Yunyao Zhang, Xinglang Zhang, Junxi Sheng, Wenbing Li, Junqing Yu, Wei Yang, Zikai Song,
Abstract summary: LogicAgent is a semiotic-square-guided framework designed to jointly address logical complexity and semantic complexity.<n>To overcome the semantic simplicity and low logical complexity of existing datasets, we introduce RepublicQA, a benchmark that reaches college-level difficulty.<n>Experiments demonstrate that LogicAgent achieves state-of-the-art performance on RepublicQA, with a 6.25% average gain over strong baselines.
Score: 16.381034926435074
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Logical reasoning is a fundamental capability of large language models (LLMs). However, existing studies largely overlook the interplay between logical complexity and semantic complexity, resulting in methods that struggle to address challenging scenarios involving abstract propositions, ambiguous contexts, and conflicting stances, which are central to human reasoning. For this gap, we propose LogicAgent, a semiotic-square-guided framework designed to jointly address logical complexity and semantic complexity. LogicAgent explicitly performs multi-perspective deduction in first-order logic (FOL), while mitigating vacuous reasoning through existential import checks that incorporate a three-valued decision scheme (True, False, Uncertain) to handle boundary cases more faithfully. Furthermore, to overcome the semantic simplicity and low logical complexity of existing datasets, we introduce RepublicQA, a benchmark that reaches college-level difficulty (FKGL = 11.94) and exhibits substantially greater lexical and structural diversity than prior benchmarks. RepublicQA is grounded in philosophical concepts, featuring abstract propositions and systematically organized contrary and contradictory relations, making it the most semantically rich resource for evaluating logical reasoning. Experiments demonstrate that LogicAgent achieves state-of-the-art performance on RepublicQA, with a 6.25% average gain over strong baselines, and generalizes effectively to mainstream logical reasoning benchmarks including ProntoQA, ProofWriter, FOLIO, and ProverQA, achieving an additional 7.05% average gain. These results highlight the strong effectiveness of our semiotic-grounded multi-perspective reasoning in boosting LLMs' logical performance.

Related papers

OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning [51.58235452818926]
We introduce OrLog, a neuro-symbolic retrieval framework that decouples predicate-level plausibility estimation from logical reasoning.<n>A large language model (LLM) provides plausibility scores for atomic predicates in one decoding-free forward pass, from which a probabilistic reasoning engine derives the posterior probability of query satisfaction.
arXiv Detail & Related papers (2026-01-30T15:31:58Z)
From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation [8.104087344683604]
We propose a novel framework, Hypothesis-driven Backward Logical Reasoning (HBLR)<n>The core idea is to integrate confidence-aware symbolic translation with hypothesis-driven backward reasoning.<n>HBLR consistently outperforms strong baselines in both accuracy and efficiency.
arXiv Detail & Related papers (2025-12-03T01:52:31Z)
LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning [33.30049437667383]
High-assurance reasoning requires conclusions that are accurate, verifiable, and grounded in evidence.<n>This paper proposes a novel neurosymbolically-grounded architecture called LOGicalThought.<n>It uses an advanced logical language and reasoner in conjunction with an LLM to construct a dual symbolic graph context and logic-based context.
arXiv Detail & Related papers (2025-10-02T00:06:23Z)
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models [58.456656119178064]
Vision-Language Models (VLMs) have emerged as foundational for multimodal intelligence.<n>However, their capacity for logical understanding remains significantly underexplored.<n>We introduce LogicBench, a benchmark with over 50,000 vision-language pairs across 9 logical categories and 4 diverse scenarios.<n>We propose LogicCLIP, a training framework designed to boost VLMs' logical sensitivity.
arXiv Detail & Related papers (2025-08-15T08:40:13Z)
From Language to Logic: A Bi-Level Framework for Structured Reasoning [6.075080928704587]
Structured reasoning over natural language inputs remains a core challenge in artificial intelligence.<n>We propose a novel framework that maps language to logic through a two-stage process: high-level task abstraction and low-level logic generation.<n>Our approach significantly outperforms existing baselines in accuracy, with accuracy gains reaching as high as 40%.
arXiv Detail & Related papers (2025-07-11T11:24:09Z)
PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z)
Learning to Reason via Mixture-of-Thought for Logical Reasoning [56.24256916896427]
Mixture-of-Thought (MoT) is a framework that enables LLMs to reason across three complementary modalities: natural language, code, and truth-table.<n>MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions.
arXiv Detail & Related papers (2025-05-21T17:59:54Z)
LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models [9.339988760379915]
LogicTree is an inference-time modular framework employing algorithm-guided search to automate structured proof exploration.<n>We introduce two-free derivations for premise prioritization, enabling strategic proof search.<n>Within LogicTree, GPT-4o outperforms o3-mini by 7.6% on average.
arXiv Detail & Related papers (2025-04-18T22:10:02Z)
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning [1.3003982724617653]
Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge.
arXiv Detail & Related papers (2024-09-25T18:35:45Z)
Modeling Hierarchical Reasoning Chains by Linking Discourse Units and Key Phrases for Reading Comprehension [80.99865844249106]
We propose a holistic graph network (HGN) which deals with context at both discourse level and word level, as the basis for logical reasoning. Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism.
arXiv Detail & Related papers (2023-06-21T07:34:27Z)
Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training. We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.