Related papers: Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models

Related papers

Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z)
Do LLMs Dream of Discrete Algorithms? [0.7646713951724011]
Large Language Models (LLMs) have rapidly transformed the landscape of artificial intelligence.<n>Their reliance on probabilistic inference limits their effectiveness in domains requiring strict logical reasoning.<n>This paper proposes a neurosymbolic approach that augments LLMs with logic-based reasoning modules.
arXiv Detail & Related papers (2025-06-29T22:03:01Z)
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study [34.29839553042609]
We propose FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions.<n>We conduct a study on the effects of supervision format during fine-tuning.<n>Our findings reveal that natural language supervision yields strong generalization even on out-of-distribution and long-context tasks.
arXiv Detail & Related papers (2025-06-05T09:34:12Z)
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences [12.277072346419748]
We study matching markets, a core framework behind applications like resource allocation and ride-sharing.<n>We evaluate several state-of-the-art models on a hierarchy of preference-based reasoning tasks.<n>Surprisingly, even top-performing models with advanced reasoning struggle to resolve instability in large markets.
arXiv Detail & Related papers (2025-06-04T21:51:15Z)
Learning to Reason via Mixture-of-Thought for Logical Reasoning [56.24256916896427]
Mixture-of-Thought (MoT) is a framework that enables LLMs to reason across three complementary modalities: natural language, code, and truth-table.<n>MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions.
arXiv Detail & Related papers (2025-05-21T17:59:54Z)
Towards Robust LLMs: an Adversarial Robustness Measurement Framework [0.0]
Large Language Models (LLMs) remain vulnerable to adversarial perturbations, undermining their reliability in high-stakes applications. We adapt the Robustness Measurement and Assessment framework to quantify LLM resilience against adversarial inputs without requiring access to model parameters. Our work provides a systematic methodology to assess LLM robustness, advancing the development of more reliable language models for real-world deployment.
arXiv Detail & Related papers (2025-04-24T16:36:19Z)
Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training [66.48331530995786]
We propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context. Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage. Experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations.
arXiv Detail & Related papers (2025-02-25T03:03:35Z)
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving [55.895917967408586]
Existing approaches to mathematical reasoning with large language models rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. We propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously.
arXiv Detail & Related papers (2025-02-17T16:56:23Z)
Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks. They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct. We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z)
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up [9.42385235462794]
Large language models (LLMs) have shown remarkable performance in reasoning tasks but face limitations in mathematical and complex logical reasoning. We propose Reversal of Thought (RoT) to enhance the logical reasoning abilities of LLMs during the warm-up phase prior to batch inference. RoT utilizes a Preference-Guided Reverse Reasoning warm-up strategy, which integrates logical symbols for pseudocode planning.
arXiv Detail & Related papers (2024-10-16T07:44:28Z)
Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs [43.67312098562139]
This paper presents a method to evaluate the alignment between the decision-making logic of Large Language Models and human cognition. We quantify the interactions encoded by the LLM as primitive decision-making logic. Experiments show that even when the language generation results appear correct, a significant portion of the internal inference logic contains notable issues.
arXiv Detail & Related papers (2024-10-06T08:33:39Z)
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning [1.3003982724617653]
Large Language Models (LLMs) have revolutionized natural language processing, yet they struggle with inconsistent reasoning. This research introduces Proof of Thought, a framework that enhances the reliability and transparency of LLM outputs. Key contributions include a robust type system with sort management for enhanced logical integrity, explicit representation of rules for clear distinction between factual and inferential knowledge.
arXiv Detail & Related papers (2024-09-25T18:35:45Z)
Logically Consistent Language Models via Neuro-Symbolic Integration [14.317886666902822]
Large language models (LLMs) are a promising venue for natural language understanding and generation. LLMs are prone to generating non-factual information and to contradicting themselves when prompted to reason about relations between entities of the world. We introduce a loss based on neuro-symbolic reasoning that teaches an LLM to be logically consistent with an external set of facts and rules.
arXiv Detail & Related papers (2024-09-09T10:52:57Z)
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [9.44858963874474]
Self-Consistency mitigates hallucinations in Large Language Models (LLMs) by sampling multiple reasoning paths. We introduce Reasoning-Aware Self-Consistency (RASC), a novel framework that enhances sampling efficiency and reasoning faithfulness.
arXiv Detail & Related papers (2024-08-30T05:14:59Z)
Logic-Enhanced Language Model Agents for Trustworthy Social Simulations [3.5083201638203154]
This study focuses on decision-making in game-theoretic scenarios as a model of human interaction. We introduce the Logic-Enhanced Language Model Agents (LELMA) framework, a novel approach to enhance the trustworthiness of social simulations.
arXiv Detail & Related papers (2024-08-28T18:25:35Z)
Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [87.34281749422756]
Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks. However, their mastery of underlying inferential rules still falls short of human capabilities. We propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic.
arXiv Detail & Related papers (2024-02-18T03:38:51Z)
LLMs for Relational Reasoning: How Far are We? [8.840750655261251]
Large language models (LLMs) have revolutionized many areas by achieving state-of-the-art performance on downstream tasks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems.
arXiv Detail & Related papers (2024-01-17T08:22:52Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
CLOMO: Counterfactual Logical Modification with Large Language Models [109.60793869938534]
We introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. We propose an innovative evaluation metric, the Self-Evaluation Score (SES), to directly evaluate the natural language output of LLMs.
arXiv Detail & Related papers (2023-11-29T08:29:54Z)
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs [95.07757789781213]
Two lines of approaches are adopted for complex reasoning with LLMs. One line of work prompts LLMs with various reasoning structures, while the structural outputs can be naturally regarded as intermediate reasoning steps. The other line of work adopt LLM-free declarative solvers to do the reasoning task, rendering higher reasoning accuracy but lacking interpretability due to the black-box nature of the solvers. We present a simple extension to the latter line of work. Specifically, we showcase that the intermediate search logs generated by Prolog interpreters can be accessed and interpreted into human-readable reasoning.
arXiv Detail & Related papers (2023-11-16T11:26:21Z)
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models [61.28463542324576]
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can generate human-like outputs. We evaluate existing state-of-the-art VLMs and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency. We propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs.
arXiv Detail & Related papers (2023-09-08T17:49:44Z)
Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training. We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning [101.26814728062065]
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving.
arXiv Detail & Related papers (2023-05-20T22:25:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.