Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model
- URL: http://arxiv.org/abs/2601.07695v2
- Date: Thu, 15 Jan 2026 03:58:36 GMT
- Title: Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model
- Authors: Siwen Jiao, Tianxiong Lv, Kangan Qian, Chenxu Zhao, Xiuyuan Zhu, Tianlun Li, Xiaolong Cheng, Jinyu Li, Zhihao Liao, Yang Cai,
- Abstract summary: Vision-Language Models (VLMs) face a critical bottleneck in achieving precise numerical prediction for 3D scene understanding.<n>Traditional reinforcement learning approaches, primarily based on relative ranking, often suffer from severe reward sparsity and gradient instability.<n>We introduce the Smooth Numerical Reward Activation (SNRA) operator and the Absolute-Preserving GRPO framework.
- Score: 18.526821056010384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language Models (VLMs) face a critical bottleneck in achieving precise numerical prediction for 3D scene understanding. Traditional reinforcement learning (RL) approaches, primarily based on relative ranking, often suffer from severe reward sparsity and gradient instability, failing to effectively exploit the verifiable signals provided by 3D physical constraints. Notably, in standard GRPO frameworks, relative normalization causes "near-miss" samples (characterized by small but non-zero errors) to suffer from advantage collapse. This leads to a severe data utilization bottleneck where valuable boundary samples are discarded during optimization. To address this, we introduce the Smooth Numerical Reward Activation (SNRA) operator and the Absolute-Preserving GRPO (AP-GRPO) framework. SNRA employs a dynamically parameterized Sigmoid function to transform raw feedback into a dense, continuous reward continuum. Concurrently, AP-GRPO integrates absolute scalar gradients to mitigate the numerical information loss inherent in conventional relative-ranking mechanisms. By leveraging this approach, we constructed Numerical3D-50k, a dataset comprising 50,000 verifiable 3D subtasks. Empirical results indicate that AP-GRPO achieves performance parity with large-scale supervised methods while maintaining higher data efficiency, effectively activating latent 3D reasoning in VLMs without requiring architectural modifications.
Related papers
- Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting [0.0]
This paper presents a framework that revises the core 3DGS optimization to prioritize efficiency.<n>We replace the standard positional gradient with a novel densification trigger that uses the opacity gradient as a lightweight proxy for rendering error.<n>On the 3-view LLFF dataset, our model is over 40% more compact (32k vs. 57k primitives) than FSGS, and on the Mip-NeRF 360 dataset, it achieves a reduction of approximately 70%.
arXiv Detail & Related papers (2025-10-11T15:33:50Z) - Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z) - Steepest Descent Density Control for Compact 3D Gaussian Splatting [72.54055499344052]
3D Gaussian Splatting (3DGS) has emerged as a powerful real-time, high-resolution novel view.<n>We propose a theoretical framework that demystifies and improves density control in 3DGS.<n>We introduce SteepGS, incorporating steepest density control, a principled strategy that minimizes loss while maintaining a compact point cloud.
arXiv Detail & Related papers (2025-05-08T18:41:38Z) - Uncertainty-Aware Normal-Guided Gaussian Splatting for Surface Reconstruction from Sparse Image Sequences [21.120659841877508]
3D Gaussian Splatting (3DGS) has achieved impressive rendering performance in novel view synthesis.<n>We propose Uncertainty-aware Normal-Guided Gaussian Splatting (UNG-GS) to quantify geometric uncertainty within the 3DGS pipeline.<n>UNG-GS significantly outperforms state-of-the-art methods in both sparse and dense sequences.
arXiv Detail & Related papers (2025-03-14T08:18:12Z) - RW-Net: Enhancing Few-Shot Point Cloud Classification with a Wavelet Transform Projection-based Network [6.305913808037513]
This work introduces RW-Net, a novel framework designed to address the challenges above by integrating Rate-Distortion Explanation (RDE) and wavelet transform.<n>By emphasizing low-frequency components of the input data, the wavelet transform captures fundamental geometric and structural attributes of 3D objects.<n>The results demonstrate that our approach achieves state-of-the-art performance and exhibits superior generalization and robustness in few-shot learning scenarios.
arXiv Detail & Related papers (2025-01-06T18:55:59Z) - GAQAT: gradient-adaptive quantization-aware training for domain generalization [54.31450550793485]
We propose a novel Gradient-Adaptive Quantization-Aware Training (GAQAT) framework for DG.<n>Our approach begins by identifying the scale-gradient conflict problem in low-precision quantization.<n>Extensive experiments validate the effectiveness of the proposed GAQAT framework.
arXiv Detail & Related papers (2024-12-07T06:07:21Z) - ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions [91.55655961014027]
3D semantic occupancy and flow prediction are fundamental to understanding scene scene.<n>This paper proposes a vision-based framework with three targeted improvements.<n>Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint semantic-flow prediction.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Federated Smoothing Proximal Gradient for Quantile Regression with Non-Convex Penalties [3.269165283595478]
Distributed sensors in the internet-of-things (IoT) generate vast amounts of sparse data.
We propose a federated smoothing proximal gradient (G) algorithm that integrates a smoothing mechanism with the view, thereby both precision and computational speed.
arXiv Detail & Related papers (2024-08-10T21:50:19Z) - Improving Gaussian Splatting with Localized Points Management [52.009874685460694]
Localized Point Management (LPM) is capable of identifying those error-contributing zones in greatest need for both point addition and geometry calibration.<n>LPM applies point densification in the identified zones and then reset the opacity of the points in front of these regions, creating a new opportunity to correct poorly conditioned points.<n> Notably, LPM improves both static 3DGS and dynamic SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds.
arXiv Detail & Related papers (2024-06-06T16:55:07Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.<n>We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.