KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?
- URL: http://arxiv.org/abs/2507.11408v1
- Date: Tue, 15 Jul 2025 15:28:37 GMT
- Title: KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?
- Authors: Soumadeep Saha, Akshay Chaturvedi, Saptarshi Saha, Utpal Garain, Nicholas Asher,
- Abstract summary: Chain-of-thought traces have been shown to improve performance of large language models in a plethora of reasoning tasks.<n>We introduce Causal CoT Graphs (CCGs), which are directed acyclic graphs automatically extracted from reasoning traces.
- Score: 4.473915603131591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-thought traces have been shown to improve performance of large language models in a plethora of reasoning tasks, yet there is no consensus on the mechanism through which this performance boost is achieved. To shed more light on this, we introduce Causal CoT Graphs (CCGs), which are directed acyclic graphs automatically extracted from reasoning traces that model fine-grained causal dependencies in the language model output. A collection of $1671$ mathematical reasoning problems from MATH500, GSM8K and AIME, and their associated CCGs are compiled into our dataset -- \textbf{KisMATH}. Our detailed empirical analysis with 15 open-weight LLMs shows that (i) reasoning nodes in the CCG are mediators for the final answer, a condition necessary for reasoning; and (ii) LLMs emphasise reasoning paths given by the CCG, indicating that models internally realise structures akin to our graphs. KisMATH enables controlled, graph-aligned interventions and opens up avenues for further investigation into the role of chain-of-thought in LLM reasoning.
Related papers
- CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge [14.367146529900609]
Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning.<n>We propose textbfCAusal textbfMAthematician (textbfCAMA), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure.
arXiv Detail & Related papers (2025-08-04T16:39:24Z) - Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM [11.181783720439563]
Large Language Models (LLMs) display sophisticated reasoning abilities via extended Chain-of-Thought (CoT) generation.<n>RLMs often demonstrate counterintuitive and unstable behaviors, such as performance degradation under few-shot prompting.<n>We introduce a unified graph-based analytical framework for better modeling the reasoning processes of RLMs.
arXiv Detail & Related papers (2025-05-20T03:54:57Z) - Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning [89.17086632436363]
We introduce a synthetic multihop reasoning environment designed to replicate the structure and distribution of real-world large-scale knowledge graphs.<n>Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios.<n>To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size.
arXiv Detail & Related papers (2025-04-04T17:57:22Z) - Grounding LLM Reasoning with Knowledge Graphs [4.279373869671241]
We propose integrating reasoning strategies with Knowledge Graphs to anchor every step or "thought" of the reasoning chains in KG data.<n>We evaluate both agentic and automated search methods across several reasoning strategies, including Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT)<n>Our experiments demonstrate that this approach consistently outperforms baseline models.
arXiv Detail & Related papers (2025-02-18T19:20:46Z) - Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures [0.0]
We introduce Adaptive Graph of Thoughts (AGoT), a dynamic, graph-based inference framework.<n>AGoT enhances Large Language Models (LLMs) reasoning solely at test time.<n>We validate our approach on diverse benchmarks spanning multi-hop retrieval, scientific reasoning, and mathematical problem-solving.
arXiv Detail & Related papers (2025-02-07T16:54:19Z) - Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning [73.2950349728376]
Large language models (LLMs) have demonstrated remarkable success across a wide range of tasks.<n>However, they still encounter challenges in reasoning tasks that require understanding and inferring relationships between pieces of information.<n>This challenge is particularly pronounced in tasks involving multi-step processes, such as logical reasoning and multi-hop question answering.<n>We propose Reasoning with Graphs (RwG) by first constructing explicit graphs from the context.
arXiv Detail & Related papers (2025-01-14T05:18:20Z) - GRS-QA -- Graph Reasoning-Structured Question Answering Dataset [50.223851616680754]
We introduce the Graph Reasoning-Structured Question Answering dataset (GRS-QA), which includes both semantic contexts and reasoning structures for QA pairs.
Unlike existing M-QA datasets, GRS-QA explicitly captures intricate reasoning pathways by constructing reasoning graphs.
Our empirical analysis reveals that LLMs perform differently when handling questions with varying reasoning structures.
arXiv Detail & Related papers (2024-11-01T05:14:03Z) - Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models [92.71304585906624]
Large language models (LLMs) struggle with faithful reasoning due to knowledge gaps and hallucinations.<n>We introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs.<n>GCR achieves state-of-the-art performance and exhibits strong zero-shot generalizability to unseen KGs without additional training.
arXiv Detail & Related papers (2024-10-16T22:55:17Z) - Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data [53.433309883370974]
This work explores the potential and limitations of using graph-based synthetic reasoning data as training signals to enhance Large Language Models' reasoning capabilities.<n>Our experiments, conducted on two established natural language reasoning tasks, demonstrate that supervised fine-tuning with synthetic graph-based reasoning data effectively enhances LLMs' reasoning performance without compromising their effectiveness on other standard evaluation benchmarks.
arXiv Detail & Related papers (2024-09-19T03:39:09Z) - Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions [47.83142414018448]
We focus on two popular reasoning tasks: arithmetic reasoning and code generation.
We introduce (i) a general ontology of perturbations for math and coding questions, (ii) a semi-automatic method to apply these perturbations, and (iii) two datasets.
We show a significant performance drop across all the models against perturbed questions.
arXiv Detail & Related papers (2024-01-17T18:13:07Z) - GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach [0.0]
Large Language Models (LLMs) have showcased impressive reasoning capabilities.
In this paper, we introduce a novel graph-based method to further augment the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2023-08-18T03:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.