Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning
- URL: http://arxiv.org/abs/2602.09396v1
- Date: Tue, 10 Feb 2026 04:06:32 GMT
- Title: Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning
- Authors: Nilaksh, Antoine Clavaud, Mathieu Reymond, François Rivest, Sarath Chandar,
- Abstract summary: In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update.<n>We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame.<n>We show that our method learns significantly richer representations, bridging the performance gap caused by the absence of a replay buffer.
- Score: 14.799267729619428
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update. While this minimizes resource usage for on-device applications, it makes agents notoriously sample-inefficient, since value-based losses alone struggle to extract meaningful representations from transient data. We propose extending Self-Predictive Representations (SPR) to the streaming pipeline to maximize the utility of every observed frame. However, due to the highly correlated samples induced by the streaming regime, naively applying this auxiliary loss results in training instabilities. Thus, we introduce orthogonal gradient updates relative to the momentum target and resolve gradient conflicts arising from streaming-specific optimizers. Validated across the Atari, MinAtar, and Octax suites, our approach systematically outperforms existing streaming baselines. Latent-space analysis, including t-SNE visualizations and effective-rank measurements, confirms that our method learns significantly richer representations, bridging the performance gap caused by the absence of a replay buffer, while remaining efficient enough to train on just a few CPU cores.
Related papers
- Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach [78.4812458793128]
We propose textbfTACO, a test-time-scaling framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks.<n>Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits.
arXiv Detail & Related papers (2025-12-02T14:42:54Z) - Mitigating Catastrophic Forgetting in Streaming Generative and Predictive Learning via Stateful Replay [0.0]
We present a unified study of stateful replay for streaming autoencoding time series forecasting, and classification.<n>We evaluate a single replay mechanism on six streaming scenarios built from Rotated MNIST, ElectricityLoadDiagrams 2011-2014, and Airlines delay data.<n>On heterogeneous multi task streams, replay reduces average forgetting by a factor of two to three, while on benign time based streams both methods perform similarly.
arXiv Detail & Related papers (2025-11-22T06:25:54Z) - PRNet: Original Information Is All You Have [3.1373048585002254]
PRNet is a real-time detection framework that prioritizes the preservation and efficient utilization of primitive shallow spatial features.<n> PRNet outperforms state-of-the-art methods under comparable computational constraints.
arXiv Detail & Related papers (2025-10-10T16:44:39Z) - Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z) - Enhanced DACER Algorithm with High Diffusion Efficiency [30.13489573844592]
We propose DACERv2, which exploits a Q-gradient field objective to guide the denoising process at each diffusion step.<n>We show that DACERv2 achieves higher performance in most complex control environments with only five diffusion steps and shows greater multimodality.
arXiv Detail & Related papers (2025-05-29T13:21:58Z) - AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse [19.13826316844611]
Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference.<n>We provide a theoretical understanding by analyzing the denoising process through the second-order Adams-Bashforth method.<n>We propose a novel caching-based acceleration approach for diffusion models, instead of directly reusing cached results.
arXiv Detail & Related papers (2025-04-13T08:29:58Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - E2ED^2:Direct Mapping from Noise to Data for Enhanced Diffusion Models [15.270657838960114]
Diffusion models have established themselves as the de facto primary paradigm in visual generative modeling.<n>We present a novel end-to-end learning paradigm that establishes direct optimization from the final generated samples to initial noises.<n>Our method achieves substantial performance gains in terms of Fr'eche't Inception Distance (FID) and CLIP score, even with fewer sampling steps.
arXiv Detail & Related papers (2024-12-30T16:06:31Z) - Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images.
textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.<n>We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.<n>The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - The Challenges of Continuous Self-Supervised Learning [40.941767578622745]
Self-supervised learning (SSL) aims to eliminate one of the major bottlenecks in representation learning - the need for human annotations.
We show that a direct application of current methods to such continuous setup is inefficient both computationally and in the amount of data required.
We propose the use of replay buffers as an approach to alleviate the issues of inefficiency and temporal correlations.
arXiv Detail & Related papers (2022-03-23T20:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.