Related papers: SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation

SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation

URL: http://arxiv.org/abs/2510.25975v1
Date: Wed, 29 Oct 2025 21:17:57 GMT
Title: SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation
Authors: Sina Bagheri Nezhad, Yao Li, Ameeta Agrawal,
Abstract summary: We introduce SymCode, a neurosymbolic framework that reframes mathematical problem-solving as a task of verifiable code generation.<n>We evaluate SymCode on challenging benchmarks, including MATH-500 and OlympiadBench, demonstrating significant accuracy improvements.
Score: 5.88623604115872
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) often struggle with complex mathematical reasoning, where prose-based generation leads to unverified and arithmetically unsound solutions. Current prompting strategies like Chain of Thought still operate within this unreliable medium, lacking a mechanism for deterministic verification. To address these limitations, we introduce SymCode, a neurosymbolic framework that reframes mathematical problem-solving as a task of verifiable code generation using the SymPy library. We evaluate SymCode on challenging benchmarks, including MATH-500 and OlympiadBench, demonstrating significant accuracy improvements of up to 13.6 percentage points over baselines. Our analysis shows that SymCode is not only more token-efficient but also fundamentally shifts model failures from opaque logical fallacies towards transparent, programmatic errors. By grounding LLM reasoning in a deterministic symbolic engine, SymCode represents a key step towards more accurate and trustworthy AI in formal domains.

Related papers

Imandra CodeLogician: Neuro-Symbolic Reasoning for Precise Analysis of Software Logic [23.59512682324697]
Large Language Models (LLMs) have shown strong performance on code understanding tasks.<n>LLMs lack the ability to perform precise, exhaustive mathematical reasoning about program behavior.<n>We present CodeLogician, a neurosymbolic agent for precise analysis of software logic, integrated with ImandraX.
arXiv Detail & Related papers (2026-01-17T00:16:41Z)
Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning [65.20602712957725]
Caco is a novel framework that automates the synthesis of high-quality, verifiable, and diverse instruction-CoT reasoning data.<n>Our work establishes a paradigm for building self-sustaining, trustworthy reasoning systems without human intervention.
arXiv Detail & Related papers (2025-10-05T07:59:24Z)
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking [54.43083499412643]
Test-time algorithms that combine the generative power of language models with process verifiers offer a promising lever for eliciting new reasoning capabilities.<n>We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors.
arXiv Detail & Related papers (2025-10-03T16:21:14Z)
SciML Agents: Write the Solver, Not the Solution [69.5021018644143]
We introduce two new datasets: a diagnostic dataset of adversarial "misleading" problems; and a large-scale benchmark of 1,000 diverse ODE tasks.<n>We evaluate open- and closed-source LLM models along two axes: (i) unguided versus guided prompting with domain-specific knowledge; and (ii) off-the-shelf versus fine-tuned variants.<n>Preliminary results indicate that careful prompting and fine-tuning can yield a specialized LLM agent capable of reliably solving simple ODE problems.
arXiv Detail & Related papers (2025-09-12T02:53:57Z)
Worst-Case Symbolic Constraints Analysis and Generalisation with Large Language Models [7.658134651527103]
Worst-case symbolic constraints analysis requires inferring the symbolic constraints that characterise worst-case program executions.<n>We show that even state-of-the-art large language models (LLMs) struggle when applied directly on this task.<n>We propose WARP, an innovative neurosymbolic approach that computes worst-case constraints on smaller concrete input sizes.
arXiv Detail & Related papers (2025-06-09T19:33:30Z)
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation [0.3495246564946556]
Large Language Models (LLMs) have achieved remarkable success in tasks requiring complex reasoning.<n>Do these models truly reason, or do they merely exploit shallow statistical patterns?<n>We introduce Chain-of-Code Collapse, where we investigate the robustness of reasoning LLMs by introducing a suite of semantically faithful yet adversarially structured prompt perturbations.
arXiv Detail & Related papers (2025-06-08T02:43:46Z)
Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning [30.938876549335067]
This paper presents SymRTLO, a novel neuron-symbolic RTL optimization framework.<n>A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic.<n> Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively.
arXiv Detail & Related papers (2025-04-14T16:15:55Z)
Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations [3.5604294978773265]
Large language models (LLMs) continue to face challenges in reliably solving reasoning tasks.<n>This paper introduces a novel neurosymbolic method that improves LLM reasoning by encoding hidden states into neurosymbolic vectors.
arXiv Detail & Related papers (2025-01-31T20:29:51Z)
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes.<n>We present a novel framework for identifying these tokens through rollout sampling.<n>We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z)
Linear Temporal Logic Modulo Theories over Finite Traces (Extended Version) [72.38188258853155]
This paper studies Linear Temporal Logic over Finite Traces (LTLf) proposition letters are replaced with first-order formulas interpreted over arbitrary theories. The resulting logic, called Satisfiability Modulo Theories (LTLfMT), is semi-decidable.
arXiv Detail & Related papers (2022-04-28T17:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.