EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
- URL: http://arxiv.org/abs/2403.00564v2
- Date: Thu, 12 Sep 2024 08:37:27 GMT
- Title: EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data
- Authors: Shengjie Wang, Shaohuai Liu, Weirui Ye, Jiacheng You, Yang Gao,
- Abstract summary: We introduce EfficientZero V2, a framework designed for sample-efficient Reinforcement Learning (RL) algorithms.
With a series of improvements, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks.
EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks.
- Score: 22.621203162457018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sample efficiency remains a crucial challenge in applying Reinforcement Learning (RL) to real-world tasks. While recent algorithms have made significant strides in improving sample efficiency, none have achieved consistently superior performance across diverse domains. In this paper, we introduce EfficientZero V2, a general framework designed for sample-efficient RL algorithms. We have expanded the performance of EfficientZero to multiple domains, encompassing both continuous and discrete actions, as well as visual and low-dimensional inputs. With a series of improvements we propose, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks under the limited data setting. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks across diverse benchmarks, such as Atari 100k, Proprio Control, and Vision Control.
Related papers
- FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving.
Recent methods leverage the range-view representation to improve processing efficiency.
We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z) - More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives [50.772462704559345]
We introduce DrICL, a novel optimization method that enhances model performance through Differentiated Learning and advantage-based Reweighting objectives.
Globally, DrICL utilizes differentiated learning to optimize the NLL objective, ensuring that many-shot performance surpasses zero-shot levels.
We develop the Many-Shot ICL Benchmark (ICL-50)-a large-scale benchmark of 50 tasks that cover shot numbers from 1 to 350 within sequences of up to 8,000 tokens-for fine-tuning purposes.
arXiv Detail & Related papers (2025-01-07T14:57:08Z) - Haste Makes Waste: A Simple Approach for Scaling Graph Neural Networks [37.41604955004456]
Graph neural networks (GNNs) have demonstrated remarkable success in graph representation learning.
Various sampling approaches have been proposed to scale GNNs to applications with large-scale graphs.
arXiv Detail & Related papers (2024-10-07T18:29:02Z) - UniZero: Generalized and Efficient Planning with Scalable Latent World Models [29.648382211926364]
UniZero is a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space.
We show that UniZero significantly outperforms existing baselines in benchmarks that require long-term memory.
In standard single-task RL settings, such as Atari and DMControl, UniZero matches or even surpasses the performance of current state-of-the-art methods.
arXiv Detail & Related papers (2024-06-15T15:24:15Z) - YOLOv10: Real-Time End-to-End Object Detection [68.28699631793967]
YOLOs have emerged as the predominant paradigm in the field of real-time object detection.
The reliance on the non-maximum suppression (NMS) for post-processing hampers the end-to-end deployment of YOLOs.
We introduce the holistic efficiency-accuracy driven model design strategy for YOLOs.
arXiv Detail & Related papers (2024-05-23T11:44:29Z) - Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator
for Vision Applications [108.44482683870888]
We introduce Deformable Convolution v4 (DCNv4), a highly efficient and effective operator designed for a broad spectrum of vision applications.
DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements.
It demonstrates exceptional performance across various tasks, including image classification, instance and semantic segmentation, and notably, image generation.
arXiv Detail & Related papers (2024-01-11T14:53:24Z) - MIND: Multi-Task Incremental Network Distillation [45.74830585715129]
In this study, we present MIND, a parameter isolation method that aims to significantly enhance the performance of replay-free solutions.
Our results showcase the superior performance of MIND indicating its potential for addressing the challenges posed by Class-incremental and Domain-Incremental learning.
arXiv Detail & Related papers (2023-12-05T17:46:52Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - An Efficiency Study for SPLADE Models [5.725475501578801]
In this paper, we focus on improving the efficiency of the SPLADE model.
We propose several techniques including L1 regularization for queries, a separation of document/ encoders, a FLOPS-regularized middle-training, and the use of faster query encoders.
arXiv Detail & Related papers (2022-07-08T11:42:05Z) - Mastering Atari Games with Limited Data [73.6189496825209]
We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero.
Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience.
This is the first time an algorithm achieves super-human performance on Atari games with such little data.
arXiv Detail & Related papers (2021-10-30T09:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.