Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection
- URL: http://arxiv.org/abs/2504.09440v1
- Date: Sun, 13 Apr 2025 05:47:52 GMT
- Title: Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection
- Authors: MingShan Liu, Shi Bo, Jialing Fang,
- Abstract summary: We introduce a structured self-consistency framework designed to enhance the reliability of mathematical reasoning.<n>Our method enforces self-consistency across intermediate steps and final outputs, reducing logical inconsistencies and hallucinations.<n> Experimental results demonstrate that SC significantly improves proof validity, symbolic reasoning accuracy, and numerical stability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have demonstrated strong mathematical reasoning capabilities but remain susceptible to hallucinations producing plausible yet incorrect statements especially in theorem proving, symbolic manipulation, and numerical computation. While self-consistency (SC) has been explored as a means to improve factuality in LLMs, existing approaches primarily apply SC to final-answer selection, neglecting the logical consistency of intermediate reasoning steps. In this work, we introduce a structured self-consistency framework designed to enhance the reliability of mathematical reasoning. Our method enforces self-consistency across intermediate steps and final outputs, reducing logical inconsistencies and hallucinations. We evaluate our approach across three core mathematical tasks: theorem proving, symbolic transformation, and numerical computation. Experimental results demonstrate that SC significantly improves proof validity, symbolic reasoning accuracy, and numerical stability while maintaining computational efficiency. Further analysis reveals that structured self-consistency not only enhances problem-solving accuracy but also reduces the variance of model-generated outputs. These findings highlight self-consistency as a robust mechanism for improving mathematical reasoning in LLMs, paving the way for more reliable and interpretable AI-driven mathematics.
Related papers
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models [17.673293240849787]
We introduce SPHERE, a self-evolving data generation pipeline that enhances reasoning in small language models (SLMs)<n> SPHERE operates in three stages: (i) Self-Generation, where the model autonomously constructs problem-solving steps; (ii) Self-Correction, enabling it to identify and rectify errors; and (iii) Diversity Induction, improving robustness through multiple valid reasoning trajectories.<n>We show that SPHERE-trained models achieve significant gains over their base versions and match/surpass GPT-4o on certain benchmarks.
arXiv Detail & Related papers (2025-03-04T14:43:25Z) - Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training [66.48331530995786]
We propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context.
Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage.
Experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations.
arXiv Detail & Related papers (2025-02-25T03:03:35Z) - Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment [21.12989936864145]
Chain-of-Thought (CoT) prompting has shown promise in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose Reasoning-as-Logic-Units (RaLU), which constructs a more reliable reasoning path by aligning logical units between the generated program and their corresponding NL descriptions.
arXiv Detail & Related papers (2025-02-05T08:23:18Z) - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z) - BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models [51.99046112135311]
We introduce JustLogic, a synthetically generated deductive reasoning benchmark for rigorous evaluation of Large Language Models.
JustLogic is highly complex, capable of generating a diverse range of linguistic patterns, vocabulary, and argument structures.
Our experimental results reveal that most state-of-the-art (SOTA) LLMs perform significantly worse than the human average.
arXiv Detail & Related papers (2025-01-24T15:49:10Z) - Large Language Models for Mathematical Analysis [3.7325315394927023]
This work addresses critical gaps in mathematical reasoning and contributes to advancing trustworthy AI.<n>We developed the DEMI-MathAnalysis dataset, comprising proof-based problems from mathematical analysis topics.<n>We also designed a guiding framework to rigorously enhance LLMs' ability to solve these problems.
arXiv Detail & Related papers (2024-12-28T20:37:55Z) - Emergence of Self-Identity in AI: A Mathematical Framework and Empirical Study with Generative Large Language Models [4.036530158875673]
This paper introduces a mathematical framework for defining and quantifying self-identity in AI systems.<n>Our framework posits that self-identity emerges from two mathematically quantifiable conditions.<n>The implications of our study are immediately relevant to the fields of humanoid robotics and autonomous systems.
arXiv Detail & Related papers (2024-11-27T17:23:47Z) - Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data [53.433309883370974]
This work explores the potential and limitations of using graph-based synthetic reasoning data as training signals to enhance Large Language Models' reasoning capabilities.<n>Our experiments, conducted on two established natural language reasoning tasks, demonstrate that supervised fine-tuning with synthetic graph-based reasoning data effectively enhances LLMs' reasoning performance without compromising their effectiveness on other standard evaluation benchmarks.
arXiv Detail & Related papers (2024-09-19T03:39:09Z) - Improved Logical Reasoning of Language Models via Differentiable
Symbolic Programming [12.984852480664378]
Pre-trained large language models (LMs) struggle to perform logical reasoning reliably despite advances in scale and compositionality.
We propose DSR-LM, a Differentiable Symbolic Reasoning framework where pre-trained LMs govern the perception of factual knowledge, and a symbolic module performs deductive reasoning.
arXiv Detail & Related papers (2023-05-05T07:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.