Evaluating Large Language Models on Solved and Unsolved Problems in Graph Theory: Implications for Computing Education
- URL: http://arxiv.org/abs/2602.05059v1
- Date: Wed, 04 Feb 2026 21:20:25 GMT
- Title: Evaluating Large Language Models on Solved and Unsolved Problems in Graph Theory: Implications for Computing Education
- Authors: Adithya Kulkarni, Mohna Chakraborty, Jay Bagga,
- Abstract summary: Large Language Models are increasingly used by students to explore advanced material in computer science.<n>This study examines the performance of a LLM on two related graph theoretic problems.
- Score: 4.64684924758613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models are increasingly used by students to explore advanced material in computer science, including graph theory. As these tools become integrated into undergraduate and graduate coursework, it is important to understand how reliably they support mathematically rigorous thinking. This study examines the performance of a LLM on two related graph theoretic problems: a solved problem concerning the gracefulness of line graphs and an open problem for which no solution is currently known. We use an eight stage evaluation protocol that reflects authentic mathematical inquiry, including interpretation, exploration, strategy formation, and proof construction. The model performed strongly on the solved problem, producing correct definitions, identifying relevant structures, recalling appropriate results without hallucination, and constructing a valid proof confirmed by a graph theory expert. For the open problem, the model generated coherent interpretations and plausible exploratory strategies but did not advance toward a solution. It did not fabricate results and instead acknowledged uncertainty, which is consistent with the explicit prompting instructions that directed the model to avoid inventing theorems or unsupported claims. These findings indicate that LLMs can support exploration of established material but remain limited in tasks requiring novel mathematical insight or critical structural reasoning. For computing education, this distinction highlights the importance of guiding students to use LLMs for conceptual exploration while relying on independent verification and rigorous argumentation for formal problem solving.
Related papers
- Automatic Question Generation for Intuitive Learning Utilizing Causal Graph Guided Chain of Thought Reasoning [8.587087233323038]
We propose a novel framework that combines causal-graph-guided Chain-of-Thought reasoning with a multi-agent language model.<n>This approach ensures the generation of accurate, meaningful, and curriculum-aligned questions.<n> Experimental results demonstrate up to a 70% improvement in quality compared to reference methods.
arXiv Detail & Related papers (2026-01-02T08:49:58Z) - Interpretability Framework for LLMs in Undergraduate Calculus [0.0]
Large Language Models (LLMs) are increasingly being used in education, yet their correctness alone does not capture the quality, reliability, or pedagogical validity of their problem-solving behavior.<n>We introduce a novel interpretability framework for analyzing LLM-generated solutions using undergraduate calculus problems as a representative domain.<n>Our approach combines reasoning flow extraction and decomposing solutions into semantically labeled operations and concepts with prompt ablation analysis to assess input salience and output stability.
arXiv Detail & Related papers (2025-10-19T17:20:36Z) - Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z) - Large Language Models and Mathematical Reasoning Failures [1.6114012813668932]
This paper investigates the mathematical reasoning capabilities of large language models (LLMs) using 50 newly constructed high-school-level word problems.<n>We rigorously analyze both final answers and solution steps to identify reasoning failures.<n>We find that while newer models (e.g., o3-mini, deepseek-r1) achieve higher accuracy, all models exhibit errors in spatial reasoning, strategic planning, and arithmetic.
arXiv Detail & Related papers (2025-02-17T09:07:32Z) - Reasoning with Graphs: Structuring Implicit Knowledge to Enhance LLMs Reasoning [73.2950349728376]
Large language models (LLMs) have demonstrated remarkable success across a wide range of tasks.<n>However, they still encounter challenges in reasoning tasks that require understanding and inferring relationships between pieces of information.<n>This challenge is particularly pronounced in tasks involving multi-step processes, such as logical reasoning and multi-hop question answering.<n>We propose Reasoning with Graphs (RwG) by first constructing explicit graphs from the context.
arXiv Detail & Related papers (2025-01-14T05:18:20Z) - LLM-based Cognitive Models of Students with Misconceptions [55.29525439159345]
This paper investigates whether Large Language Models (LLMs) can be instruction-tuned to meet this dual requirement.
We introduce MalAlgoPy, a novel Python library that generates datasets reflecting authentic student solution patterns.
Our insights enhance our understanding of AI-based student models and pave the way for effective adaptive learning systems.
arXiv Detail & Related papers (2024-10-16T06:51:09Z) - Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs)<n>We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem.<n>Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought [28.122761006724925]
The Pedagogical Chain-of-Thought (PedCoT) is designed to guide the identification of reasoning mistakes.<n>PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts.<n>The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading.
arXiv Detail & Related papers (2024-05-09T07:37:34Z) - Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners? [140.9751389452011]
We study the biases of large language models (LLMs) in relation to those known in children when solving arithmetic word problems.
We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features.
arXiv Detail & Related papers (2024-01-31T18:48:20Z) - GraphReason: Enhancing Reasoning Capabilities of Large Language Models through A Graph-Based Verification Approach [0.0]
Large Language Models (LLMs) have showcased impressive reasoning capabilities.
In this paper, we introduce a novel graph-based method to further augment the reasoning capabilities of LLMs.
arXiv Detail & Related papers (2023-08-18T03:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.