SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
- URL: http://arxiv.org/abs/2506.02096v1
- Date: Mon, 02 Jun 2025 17:45:16 GMT
- Title: SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
- Authors: Zijian Wu, Jinjie Ni, Xiangyan Liu, Zichen Liu, Hang Yan, Michael Qizhe Shieh,
- Abstract summary: We propose SynthRL, a scalable and guaranteed pipeline for automatic data scaling in reasoning-oriented RL training.<n>Our empirical experiments demonstrate SynthRL's scalability and effectiveness.<n>Models trained with our synthesized data achieve consistent gains across five out-of-domain visual math reasoning benchmarks.
- Score: 9.47779155214011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-language models (VLMs) trained via reinforcement learning with verifiable reward (RLVR) have shown notable progress in scaling test-time compute effectively. In this work, we investigate how synthesized RL data can further improve RLVR. To this end, we propose \textbf{SynthRL}-a scalable and guaranteed pipeline for automatic data scaling in reasoning-oriented RL training. SynthRL comprises three key stages: (1) selecting seed questions with appropriate distribution, (2) augmenting them into more challenging variants while preserving the original answers, and (3) a guaranteed verification stage that ensures near-perfect correctness and difficulty enhancement. Our empirical experiments demonstrate SynthRL's scalability and effectiveness. When applied to the MMK12 dataset, SynthRL synthesizes over 3.3K additional verifiable, challenging questions from approximately 8K seed samples. Models trained with our synthesized data achieve consistent gains across five out-of-domain visual math reasoning benchmarks, with a significant improvement over baseline models trained on seed data alone. Notably, detailed analysis reveals that the gains are more pronounced on the most challenging evaluation samples, highlighting SynthRL's effectiveness in eliciting deeper and more complex reasoning patterns.
Related papers
- Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z) - The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models [63.98194996746229]
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization.<n>However, reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations.<n>We propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification.
arXiv Detail & Related papers (2025-05-30T14:23:32Z) - Synthline: A Product Line Approach for Synthetic Requirements Engineering Data Generation using Large Language Models [0.5156484100374059]
This paper introduces Synthline, a Product Line (PL) approach that leverages Large Language Models to generate synthetic Requirements Engineering (RE) data.<n>Our analysis reveals that while synthetic datasets exhibit less diversity than real data, they are good enough to serve as viable training resources.<n>Our evaluation shows that combining synthetic and real data leads to substantial performance improvements.
arXiv Detail & Related papers (2025-05-06T07:57:16Z) - Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets.<n>Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z) - Towards Effective and Efficient Continual Pre-training of Large Language Models [163.34610964970258]
Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks.
This paper presents a technical report for continually pre-training Llama-3 (8B)
It significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model.
arXiv Detail & Related papers (2024-07-26T13:55:21Z) - Unlock the Correlation between Supervised Fine-Tuning and Reinforcement Learning in Training Code Large Language Models [12.656574142412484]
We make an attempt to understand the correlation between supervised fine-tuning and reinforcement learning.<n>We find that both atomic and synthetic functions are indispensable for SFT's generalization.
arXiv Detail & Related papers (2024-06-14T03:39:01Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - Synthetic Experience Replay [48.601879260071655]
We propose Synthetic Experience Replay ( SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience.
We show that SynthER is an effective method for training RL agents across offline and online settings.
We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
arXiv Detail & Related papers (2023-03-12T09:10:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.