AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis
- URL: http://arxiv.org/abs/2512.23366v1
- Date: Mon, 29 Dec 2025 10:49:35 GMT
- Title: AGRO-SQL: Agentic Group-Relative Optimization with High-Fidelity Data Synthesis
- Authors: Cehua Yang, Dongyu Xiao, Junming Lin, Yuyang Song, Hanxu Yan, Shawn Guo, Wei Zhang, Jian Yang, Mingjie Tang, Bryan Dai,
- Abstract summary: We propose a holistic framework that addresses the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios.<n>From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness.<n>From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework.
- Score: 10.05616886251577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this paper, we propose a holistic framework that addresses these issues through a dual-centric approach. From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness and precise semantic-logic alignment, ensured by strict verification. From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework. This framework employs a Diversity-Aware Cold Start stage to initialize a robust policy, followed by Group Relative Policy Optimization (GRPO) to refine the agent's reasoning via environmental feedback. Extensive experiments on BIRD and Spider benchmarks demonstrate that our synergistic approach achieves state-of-the-art performance among single-model methods.
Related papers
- Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation [27.59197535041953]
Large Language Models (LLMs) represent a promising frontier for recommender systems.<n>This paper introduces a novel, layered framework for generating high-quality synthetic data.<n>We empirically demonstrate, for the first time, robust power-law scaling for an LLM that is continually pre-trained on our high-quality, recommendation-specific data.
arXiv Detail & Related papers (2026-02-07T01:15:15Z) - O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL [28.10102994309489]
We introduce a novel framework for the automated synthesis of sophisticated, research-grade instructional data.<n>Our approach centers on a multi-agent workflow where collaborative AI agents simulate complex tool-integrated reasoning.<n>We develop a two-stage training strategy that integrates supervised fine-tuning with a novel reinforcement learning method.
arXiv Detail & Related papers (2026-01-07T09:31:10Z) - TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning [28.052232941379884]
TableGPT-R1 is a specialized model built on a systematicReinforcement Learning framework.<n>Our approach synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts.<n>It achieves state-of-the-art performance on authoritative benchmarks.
arXiv Detail & Related papers (2025-12-23T12:30:37Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - SyGra: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data [0.0]
We present a comprehensive synthetic data generation framework for large language models (LLMs)<n>Our approach employs a modular and configuration-based pipeline capable of modeling complex dialogue flows with minimal manual intervention.<n>The resulting datasets are structured under a flexible schema supporting both SFT and DPO use cases, enabling seamless integration into diverse training.
arXiv Detail & Related papers (2025-08-21T10:35:41Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - SMOTExT: SMOTE meets Large Language Models [19.394116388173885]
We propose a novel technique, SMOTExT, that adapts the idea of Synthetic Minority Over-sampling (SMOTE) to textual data.<n>Our method generates new synthetic examples by interpolating between BERT-based embeddings of two existing examples.<n>In early experiments, training models solely on generated data achieved comparable performance to models trained on the original dataset.
arXiv Detail & Related papers (2025-05-19T17:57:36Z) - Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark [65.13288661320364]
We introduce our benchmark, textbfScenario-Wise Rec, which comprises 6 public datasets and 12 benchmark models, along with a training and evaluation pipeline.<n>We aim for this benchmark to offer researchers valuable insights from prior work, enabling the development of novel models.
arXiv Detail & Related papers (2024-12-23T08:15:34Z) - Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration [81.45763823762682]
This work aims to bridge the gap by investigating the problem of data synthesis through multi-agent sampling.<n>We introduce Tree Search-based Orchestrated Agents(TOA), where the workflow evolves iteratively during the sequential sampling process.<n>Our experiments on alignment, machine translation, and mathematical reasoning demonstrate that multi-agent sampling significantly outperforms single-agent sampling as inference compute scales.
arXiv Detail & Related papers (2024-12-22T15:16:44Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.