Related papers: Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

URL: http://arxiv.org/abs/2512.06533v1
Date: Sat, 06 Dec 2025 18:57:38 GMT
Title: Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning
Authors: Ming Chen, Sheng Tang, Rong-Xi Tan, Ziniu Li, Jiacheng Chen, Ke Xue, Chao Qian,
Abstract summary: We propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL)<n>We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence.<n>Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.
Score: 39.920697401868885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.

Related papers

Sequential Regression for Continuous Value Prediction using Residual Quantization [8.96389388600604]
Continuous value prediction plays a crucial role in industrial-scale recommendation systems.<n>Existing generative approaches rely on rigid parametric distribution assumptions.<n>We propose a residual quantization (RQ)-based sequence learning framework that represents target continuous values as a sum of ordered quantization codes.
arXiv Detail & Related papers (2026-02-26T13:52:54Z)
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective [85.06838178922791]
Reinforcement Learning (RL) has proven highly effective for autoregressive language models.<n>But adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges.<n>We propose a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy.
arXiv Detail & Related papers (2025-12-03T13:05:32Z)
Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z)
Uncertainty Quantification for Regression using Proper Scoring Rules [76.24649098854219]
We introduce a unified UQ framework for regression based on proper scoring rules, such as CRPS, logarithmic, squared error, and quadratic scores.<n>We derive closed-form expressions for the uncertainty measures under practical parametric assumptions and show how to estimate them using ensembles of models.<n>Our broad evaluation on synthetic and real-world regression datasets provides guidance for selecting reliable UQ measures.
arXiv Detail & Related papers (2025-09-30T17:52:12Z)
RL as Regressor: A Reinforcement Learning Approach for Function Approximation [0.0]
We propose framing regression as a Reinforcement Learning (RL) problem.<n>We demonstrate this by treating a model's prediction as an action and defining a custom reward signal based on the prediction error.<n>We show that the RL framework not only successfully solves the regression problem but also offers enhanced flexibility in defining objectives and guiding the learning process.
arXiv Detail & Related papers (2025-07-31T21:39:24Z)
Decoding-based Regression [29.15816693410931]
We investigate the utility of causal sequence decoding models as numeric regression heads given any feature representation.<n>We find that, despite being trained in the usual way, decoder-based heads are as performant as standard pointwise heads when benchmarked over standard regression tasks.
arXiv Detail & Related papers (2025-01-31T18:37:42Z)
Generative Regression Based Watch Time Prediction for Short-Video Recommendation [36.95095097454143]
Watch time prediction has emerged as a pivotal task in short video recommendation systems.<n>Recent studies have attempted to address these issues by converting the continuous watch time estimation into an ordinal regression task.<n>We propose a novel Generative Regression (GR) framework that reformulates WTP as a sequence generation task.
arXiv Detail & Related papers (2024-12-28T16:48:55Z)
Deep Generative Symbolic Regression [83.04219479605801]
Symbolic regression aims to discover concise closed-form mathematical equations from data. Existing methods, ranging from search to reinforcement learning, fail to scale with the number of input variables. We propose an instantiation of our framework, Deep Generative Symbolic Regression.
arXiv Detail & Related papers (2023-12-30T17:05:31Z)
Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction [3.448070371030467]
We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score.
arXiv Detail & Related papers (2023-11-26T09:50:32Z)
ResMem: Learn what you can and memorize the rest [79.19649788662511]
We propose the residual-memorization (ResMem) algorithm to augment an existing prediction model. By construction, ResMem can explicitly memorize the training labels. We show that ResMem consistently improves the test set generalization of the original prediction model.
arXiv Detail & Related papers (2023-02-03T07:12:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.