Related papers: CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

URL: http://arxiv.org/abs/2508.02583v2
Date: Thu, 07 Aug 2025 15:57:23 GMT
Title: CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
Authors: Lei Zan, Keli Zhang, Ruichu Cai, Lujia Pan,
Abstract summary: Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning.<n>We propose textbfCAusal textbfMAthematician (textbfCAMA), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure.
Score: 14.367146529900609
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning, a challenge fundamentally rooted in deep structural dependencies. To address this challenge, we propose \textbf{CA}usal \textbf{MA}thematician (\textbf{CAMA}), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure. In the learning stage, CAMA first constructs the \textbf{M}athematical \textbf{C}ausal \textbf{G}raph (\textbf{MCG}), a high-level representation of solution strategies, by combining LLM priors with causal discovery algorithms applied to a corpus of question-solution pairs. The resulting MCG encodes essential knowledge points and their causal dependencies. To better align the graph with downstream reasoning tasks, CAMA further refines the MCG through iterative feedback derived from a selected subset of the question-solution pairs. In the reasoning stage, given a new question, CAMA dynamically extracts a task-relevant subgraph from the MCG, conditioned on both the question content and the LLM's intermediate reasoning trace. This subgraph, which encodes the most pertinent knowledge points and their causal dependencies, is then injected back into the LLM to guide its reasoning process. Empirical results on real-world datasets show that CAMA significantly improves LLM performance on challenging mathematical problems. Furthermore, our experiments demonstrate that structured guidance consistently outperforms unstructured alternatives, and that incorporating asymmetric causal relationships yields greater improvements than using symmetric associations alone.

Related papers

KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? [4.473915603131591]
Chain-of-thought traces have been shown to improve performance of large language models in a plethora of reasoning tasks.<n>We introduce Causal CoT Graphs (CCGs), which are directed acyclic graphs automatically extracted from reasoning traces.
arXiv Detail & Related papers (2025-07-15T15:28:37Z)
Reinforcing Question Answering Agents with Minimalist Policy Gradient Optimization [80.09112808413133]
Mujica is a planner that decomposes questions into acyclic graph of subquestions and a worker that resolves questions via retrieval and reasoning.<n>MyGO is a novel reinforcement learning method that replaces traditional policy updates with gradient Likelihood Maximum Estimation.<n> Empirical results across multiple datasets demonstrate the effectiveness of MujicaMyGO in enhancing multi-hop QA performance.
arXiv Detail & Related papers (2025-05-20T18:33:03Z)
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer [37.81465564673498]
Large Language Models (LLMs) have demonstrated promising capabilities in solving mathematical reasoning tasks.<n>We propose textbfMetaLadder, a framework that explicitly prompts LLMs to recall and reflect on meta-problems.<n>Our experiments on mathematical benchmarks demonstrate that our MetaLadder significantly boosts LLMs' problem-solving accuracy.
arXiv Detail & Related papers (2025-03-19T04:36:35Z)
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures [0.0]
We introduce Adaptive Graph of Thoughts (AGoT), a dynamic, graph-based inference framework.<n>AGoT enhances Large Language Models (LLMs) reasoning solely at test time.<n>We validate our approach on diverse benchmarks spanning multi-hop retrieval, scientific reasoning, and mathematical problem-solving.
arXiv Detail & Related papers (2025-02-07T16:54:19Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models [62.12031550252253]
We present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning.<n>PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context.<n>PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers.
arXiv Detail & Related papers (2024-12-23T20:27:12Z)
Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation [68.58373854950294]
We focus on causal reasoning and address the task of establishing causal relationships based on correlation information.<n>We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions.<n>We evaluate our approach on an existing causal benchmark, Corr2Cause.
arXiv Detail & Related papers (2024-12-18T15:32:27Z)
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs [80.96119560172224]
MathGAP generates problem statements and chain-of-thought reasoning traces according to specifications about their arithmetic proof structure.<n>Using MathGAP, we find that LLMs show a significant decrease in performance as proofs get deeper and wider.
arXiv Detail & Related papers (2024-10-17T12:48:14Z)
Causal Reasoning in Large Language Models: A Knowledge Graph Approach [6.5344638992876085]
Large language models (LLMs) typically improve performance by either retrieving semantically similar information, or enhancing reasoning abilities through structured prompts like chain-of-thought. This paper proposes a knowledge graph (KG)-based random-walk reasoning approach that leverages causal relationships.
arXiv Detail & Related papers (2024-10-15T13:24:44Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
We focus on two popular reasoning tasks: arithmetic reasoning and code generation. We introduce (i) a general ontology of perturbations for math and coding questions, (ii) a semi-automatic method to apply these perturbations, and (iii) two datasets. We show a significant performance drop across all the models against perturbed questions.
arXiv Detail & Related papers (2024-01-17T18:13:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.