Related papers: CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity

URL: http://arxiv.org/abs/1902.05605v4
Date: Mon, 25 Mar 2024 10:20:18 GMT
Title: CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity
Authors: Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, Jan Peters,
Abstract summary: CrossQ matches or surpasses current state-of-the-art methods in terms of sample efficiency. It substantially reduces the computational cost compared to REDQ and DroQ. It is easy to implement, requiring just a few lines of code on top of SAC.
Score: 34.36803740112609
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce CrossQ: A lightweight algorithm for continuous control tasks that makes careful use of Batch Normalization and removes target networks to surpass the current state-of-the-art in sample efficiency while maintaining a low UTD ratio of 1. Notably, CrossQ does not rely on advanced bias-reduction schemes used in current methods. CrossQ's contributions are threefold: (1) it matches or surpasses current state-of-the-art methods in terms of sample efficiency, (2) it substantially reduces the computational cost compared to REDQ and DroQ, (3) it is easy to implement, requiring just a few lines of code on top of SAC.

Related papers

Scaling CrossQ with Weight Normalization [15.605124749589946]
CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1.<n>We identify challenges in the training dynamics which are emphasized by higher UTDs.<n>We propose a solution that stabilizes training, prevents potential loss of plasticity and keeps the effective learning rate constant.
arXiv Detail & Related papers (2025-06-04T09:24:17Z)
Fractured Chain-of-Thought Reasoning [61.647243580650446]
We introduce Fractured Sampling, a unified inference-time strategy that interpolates between full CoT and solution-only sampling.<n>We show that Fractured Sampling consistently achieves superior accuracy-cost trade-offs, yielding steep log-linear scaling gains in Pass@k versus token budget.
arXiv Detail & Related papers (2025-05-19T11:30:41Z)
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm [13.768298349218927]
Post-training Quantization (PTQ) has become a widely used technique for improving inference efficiency of large language models (LLMs) Existing PTQ methods generally suffer from crucial limitations such as heavy calibration data requirements and inflexible choice of target number of bits. We propose RaanA, a unified PTQ framework introducing two novel components: 1) RaBitQ-H, a variant of a randomized vector quantization method RaBitQ, designed for fast, accurate, and highly efficient quantization; and 2) AllocateBits, an algorithm that optimally allocates bit-widths across layers based on their quantization
arXiv Detail & Related papers (2025-03-29T05:03:12Z)
Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization [15.605124749589946]
CrossQ has demonstrated state-of-the-art sample efficiency with a low update-to-data (UTD) ratio of 1. We identify challenges in the training dynamics, which are emphasized by higher UTD ratios. Our proposed approach reliably scales with increasing UTD ratios, achieving competitive performance across 25 challenging continuous control tasks.
arXiv Detail & Related papers (2025-02-11T12:55:32Z)
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks [81.2624272756733]
In dense retrieval, deep encoders provide embeddings for both inputs and targets. We train a small parametric corrector network that adjusts stale cached target embeddings. Our approach matches state-of-the-art results even when no target embedding updates are made during training.
arXiv Detail & Related papers (2024-09-03T13:29:13Z)
Efficient Sparse-Reward Goal-Conditioned Reinforcement Learning with a High Replay Ratio and Regularization [1.57731592348751]
Reinforcement learning (RL) methods with a high replay ratio (RR) and regularization have gained interest due to their superior sample efficiency. In this paper, we aim to extend these RL methods to sparse-reward goal-conditioned tasks.
arXiv Detail & Related papers (2023-12-10T06:30:19Z)
Deep Equilibrium Optical Flow Estimation [80.80992684796566]
Recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms. These RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. We propose deep equilibrium (DEQ) flow estimators, an approach that directly solves for the flow as the infinite-level fixed point of an implicit layer.
arXiv Detail & Related papers (2022-04-18T17:53:44Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Dropout Q-Functions for Doubly Efficient Reinforcement Learning [12.267045729018653]
We propose a method of improving computational efficiency called Dr.Q. Dr.Q is a variant of REDQ that uses a small ensemble of dropout Q-functions. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.
arXiv Detail & Related papers (2021-10-05T13:28:11Z)
Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection. scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z)
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP. DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize. We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z)
Randomized Ensembled Double Q-Learning: Learning Fast Without a Model [8.04816643418952]
We introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ) We show that REDQ's performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark.
arXiv Detail & Related papers (2021-01-15T06:25:58Z)
Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.