RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs
- URL: http://arxiv.org/abs/2512.00319v1
- Date: Sat, 29 Nov 2025 04:47:14 GMT
- Title: RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs
- Authors: Ruike Hu, Shulei Wu,
- Abstract summary: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language generation and reasoning.<n>Their integration into automated software ecosystems is often hindered by the "Structure Gap"<n>We propose a lightweight, efficient Reinforcement Learning framework to bridge this gap.
- Score: 0.08594140167290097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language generation and reasoning. However, their integration into automated software ecosystems is often hindered by the "Structure Gap" - the inherent tension between the probabilistic nature of token generation and the deterministic requirements of structured data formats (e.g., JSON, XML). Traditional Supervised Fine-Tuning (SFT) often fails to enforce strict syntactic constraints, leading to "hallucinated" keys or malformed structures, while constrained decoding methods impose significant inference latency. In this paper, we propose a lightweight, efficient Reinforcement Learning (RL) framework to bridge this gap. We introduce a novel Multi-dimensional Reward Function that decomposes the structured output task into a hierarchy of constraints: structural integrity, format correctness, content accuracy, and validity. Leveraging Gradient Regularized Policy Optimization (GRPO), we enable the model to internalize these constraints without the need for a separate critic network, reducing peak VRAM usage by 40% compared to PPO. We validate our approach on multiple tasks, including complex recipe generation and structured math reasoning (GSM8K-JSON). Experimental results demonstrate that our method achieves 89.7% structural accuracy and 92.1% JSON validity, significantly outperforming both zero-shot baselines (e.g., GPT-3.5) and SFT on larger models like LLaMA-3-8B. Furthermore, we provide a detailed analysis of training dynamics, revealing a distinct self-paced curriculum where the model sequentially acquires syntactic proficiency before semantic accuracy. Our model is publicly available at https://huggingface.co/Freakz3z/Qwen-JSON.
Related papers
- STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability [11.095198847819573]
Large Language Models (LLMs) are increasingly deployed for structured data generation.<n>We introduce a comprehensive framework for evaluating and improving consistency in LLM-generated structured outputs.
arXiv Detail & Related papers (2025-11-27T02:49:52Z) - Structured Uncertainty guided Clarification for LLM Agents [126.26213027785813]
LLM agents extend large language models with tool-calling capabilities, but ambiguous user instructions often lead to incorrect invocations and task failures.<n>We introduce a principled formulation of structured uncertainty over tool-call parameters, modeling joint tool-argument clarification as a POMDP with Expected Value of Perfect Information (EVPI) objective for optimal question selection and aspect-based cost modeling to prevent redundancy.<n>Our SAGE-Agent leverages this structured uncertainty to achieve superior efficiency: increasing coverage on ambiguous tasks by 7-39% while reducing clarification questions by 1.5-2.7$times$ compared to strong prompting and uncertainty-based baselines.
arXiv Detail & Related papers (2025-11-11T21:50:44Z) - CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z) - Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z) - HEFT: A Coarse-to-Fine Hierarchy for Enhancing the Efficiency and Accuracy of Language Model Reasoning [0.0]
HEFT is a novel hierarchical adaptation strategy that composes two distinct PEFT methods in a coarse-to-fine manner.<n>A model fine-tuned for only three epochs with our HEFT strategy achieves an accuracy of 85.17%, exceeding the performance of models trained for 20 epochs.
arXiv Detail & Related papers (2025-09-11T19:06:46Z) - SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - Enhancing Spatial Reasoning in Vision-Language Models via Chain-of-Thought Prompting and Reinforcement Learning [0.42855555838080844]
This study investigates the spatial reasoning capabilities of vision-language models (VLMs) through Chain-of-Thought prompting and reinforcement learning.<n>We find that simple CoT formats, where the model generates a reasoning step before the answer, can harm the model's original performance.<n>In contrast, structured multi-stage prompting based on scene graphs (SceneGraph CoT) significantly improves spatial reasoning accuracy.
arXiv Detail & Related papers (2025-07-06T10:51:12Z) - SLOT: Structuring the Output of Large Language Models [5.683327173793259]
We present SLOT (Structured LLM Output Transformer), a model-agnostic approach that transforms unstructured LLM outputs into precise structured formats.<n>Our results demonstrate that fine-tuned Mistral-7B model with constrained decoding achieves near perfect schema accuracy.<n> Notably, even compact models like Llama-3.2-1B can match or exceed the structured output capabilities of much larger proprietary models.
arXiv Detail & Related papers (2025-05-06T23:29:43Z) - Pushing the boundary on Natural Language Inference [49.15148871877941]
Natural Language Inference (NLI) is a central task in natural language understanding with applications in fact-checking, question answering and information retrieval.<n>Despite its importance, current NLI systems heavily rely on learning with limiting artifacts and biases, inference and real-world applicability.<n>This work provides a framework for building robust NLI systems without sacrificing quality or real-world applicability.
arXiv Detail & Related papers (2025-04-25T14:20:57Z) - Elucidating the Design Space of Multimodal Protein Language Models [69.3650883370033]
Multimodal protein language models (PLMs) integrate sequence and token-based structural information.<n>This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations.<n>Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
arXiv Detail & Related papers (2025-04-15T17:59:43Z) - Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models [12.656574142412484]
We make an attempt to understand the correlation between supervised fine-tuning and reinforcement learning.<n>We find that both atomic and synthetic functions are indispensable for SFT's generalization.
arXiv Detail & Related papers (2024-06-14T03:39:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.