Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology
- URL: http://arxiv.org/abs/2509.04372v1
- Date: Thu, 04 Sep 2025 16:29:38 GMT
- Title: Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology
- Authors: Yuchen Jiao, Yuxin Chen, Gen Li,
- Abstract summary: We clarify some intimate connections and equivalences between reinforcement learning with human feedback, reinforcement learning with internal feedback, and test-time scaling.<n>We introduce a resampling approach for alignment and reward-directed diffusion models, sidestepping the need for explicit reinforcement learning techniques.
- Score: 20.827441524264945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this note, we reflect on several fundamental connections among widely used post-training techniques. We clarify some intimate connections and equivalences between reinforcement learning with human feedback, reinforcement learning with internal feedback, and test-time scaling (particularly soft best-of-$N$ sampling), while also illuminating intrinsic links between diffusion guidance and test-time scaling. Additionally, we introduce a resampling approach for alignment and reward-directed diffusion models, sidestepping the need for explicit reinforcement learning techniques.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Foundations of Reinforcement Learning and Interactive Decision Making [81.76863968810423]
We present a unifying framework for addressing the exploration-exploitation dilemma using frequentist and Bayesian approaches.
Special attention is paid to function approximation and flexible model classes such as neural networks.
arXiv Detail & Related papers (2023-12-27T21:58:45Z) - From Patches to Objects: Exploiting Spatial Reasoning for Better Visual
Representations [2.363388546004777]
We propose a novel auxiliary pretraining method that is based on spatial reasoning.
Our proposed method takes advantage of a more flexible formulation of contrastive learning by introducing spatial reasoning as an auxiliary task for discriminative self-supervised methods.
arXiv Detail & Related papers (2023-05-21T07:46:46Z) - Learning Trajectories are Generalization Indicators [44.53518627207067]
This paper explores the connection between learning trajectories of Deep Neural Networks (DNNs) and their generalization capabilities.
We present a novel perspective for analyzing generalization error by investigating the contribution of each update step to the change in generalization error.
Our approach can also track changes in generalization error when adjustments are made to learning rates and label noise levels.
arXiv Detail & Related papers (2023-04-25T05:08:57Z) - Deep Bregman Divergence for Contrastive Learning of Visual
Representations [4.994260049719745]
Deep Bregman divergence measures divergence of data points using neural networks which is beyond Euclidean distance.
We aim to enhance contrastive loss used in self-supervised learning by training additional networks based on functional Bregman divergence.
arXiv Detail & Related papers (2021-09-15T17:44:40Z) - Self-Supervised Structure-from-Motion through Tightly-Coupled Depth and
Egomotion Networks [11.888728516442905]
We introduce several notions of coupling, categorize existing approaches, and present a novel tightly-coupled approach.
We demonstrate that our approach promotes consistency between the depth and egomotion predictions at test time, improves generalization on new data, and leads to state-of-the-art accuracy on indoor and outdoor depth and egomotion evaluation benchmarks.
arXiv Detail & Related papers (2021-06-07T23:30:45Z) - Stylized Adversarial Defense [105.88250594033053]
adversarial training creates perturbation patterns and includes them in the training set to robustify the model.
We propose to exploit additional information from the feature space to craft stronger adversaries.
Our adversarial training approach demonstrates strong robustness compared to state-of-the-art defenses.
arXiv Detail & Related papers (2020-07-29T08:38:10Z) - Learning Representations that Support Extrapolation [39.84463809100903]
We consider the challenge of learning representations that support extrapolation.
We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation.
We also introduce a simple technique, temporal context normalization, that encourages representations that emphasize the relations between objects.
arXiv Detail & Related papers (2020-07-09T20:53:45Z) - Knowledge-guided Deep Reinforcement Learning for Interactive
Recommendation [49.32287384774351]
Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy.
We propose Knowledge-Guided deep Reinforcement learning to harness the advantages of both reinforcement learning and knowledge graphs for interactive recommendation.
arXiv Detail & Related papers (2020-04-17T05:26:47Z) - Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule.
We introduce a "grafting" experiment which decouples an update's magnitude from its direction.
We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.