Related papers: Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning

URL: http://arxiv.org/abs/2509.18083v1
Date: Mon, 22 Sep 2025 17:56:38 GMT
Title: Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
Authors: Valentin Lacombe, Valentin Quesnel, Damien Sileo,
Abstract summary: Reasoning Core is a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR)<n> Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving.
Score: 2.62112541805429
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.

Related papers

Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training [2.62112541805429]
Reasoning Core is a scalable suite that procedurally generates verifiable symbolic reasoning data across core formal domains.<n>Each task is paired with an external solver for rigorous verification and admits continuous difficulty control for curriculum design.<n>Experiments show that mixing Reasoning Core data into pre-training improves downstream reasoning while preserving, or slightly improving, language modeling quality.
arXiv Detail & Related papers (2026-03-02T18:59:29Z)
ALIVE: Awakening LLM Reasoning via Adversarial Learning and Instructive Verbal Evaluation [4.265094703231012]
We introduce textbfALIVE (emphAdrial Learning with Instructive Verbal Evaluation), a hands-free alignment framework.<n>By coupling adversarial learning with instructive verbal feedback, ALIVE enables models to internalize evaluative criteria directly from raw corpora.<n>With identical data and compute, ALIVE achieves markedly improved cross-domain generalization, and higher self-correction rates.
arXiv Detail & Related papers (2026-02-05T09:20:23Z)
Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation [18.636244209466266]
Iteratively Improved Program Construction (IIPC) is a reasoning method that iteratively refines programmatic reasoning chains and combines execution feedback with the native Chain-of-thought abilities of the base LLM.<n>IIPC surpasses competing approaches in the majority of reasoning benchmarks on multiple base LLMs.
arXiv Detail & Related papers (2026-02-03T19:13:31Z)
KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z)
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z)
Implicit Reasoning in Large Language Models: A Comprehensive Survey [67.53966514728383]
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks.<n>Recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning.<n>This survey introduces a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies.
arXiv Detail & Related papers (2025-09-02T14:16:02Z)
Scalable Chain of Thoughts via Elastic Reasoning [61.75753924952059]
Elastic Reasoning is a novel framework for scalable chain of thoughts.<n>It separates reasoning into two phases--thinking and solution--with independently allocated budgets.<n>Our approach produces more concise and efficient reasoning even in unconstrained settings.
arXiv Detail & Related papers (2025-05-08T15:01:06Z)
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding.<n>It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions.<n>Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT)<n>Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z)
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation [24.081573908824353]
First-order logic (FOL) reasoning is pivotal for intelligent systems.<n>Existing benchmarks often rely on extensive human annotation or handcrafted templates.<n>We propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models with the rigor and precision of symbolic provers.
arXiv Detail & Related papers (2025-02-10T15:31:54Z)
Instantiation-based Formalization of Logical Reasoning Tasks using Language Models and Logical Solvers [4.897782942277061]
We introduce Semantic Self-Verification (SSV), a novel approach to accurately formulate the reasoning problem from natural language to the formal language of the solver.<n>SSV uses a consistency-based approach to produce strong abstract formalizations of problems using concrete instantiations that are generated by the model and verified by the solver.<n>We propose such *near-certain reasoning* as a new approach to reduce the need for manual verification in many cases, taking us closer to more dependable and autonomous AI reasoning systems.
arXiv Detail & Related papers (2025-01-28T14:04:49Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.