Related papers: Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search

Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search

URL: http://arxiv.org/abs/2012.13091v2
Date: Fri, 25 Dec 2020 04:52:51 GMT
Title: Auto-Agent-Distiller: Towards Efficient Deep Reinforcement Learning Agents via Neural Architecture Search
Authors: Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yingyan Lin
Abstract summary: We propose an Auto-Agent-Distiller (A2D) framework to automatically search for the optimal DRL agents for various tasks. We demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability. We then develop a novel distillation mechanism to distill the knowledge from both the teacher agent's actor and critic to stabilize the searching process and improve the searched agents' optimality.
Score: 14.292072505007974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AlphaGo's astonishing performance has ignited an explosive interest in developing deep reinforcement learning (DRL) for numerous real-world applications, such as intelligent robotics. However, the often prohibitive complexity of DRL stands at the odds with the required real-time control and constrained resources in many DRL applications, limiting the great potential of DRL powered intelligent devices. While substantial efforts have been devoted to compressing other deep learning models, existing works barely touch the surface of compressing DRL. In this work, we first identify that there exists an optimal model size of DRL that can maximize both the test scores and efficiency, motivating the need for task-specific DRL agents. We therefore propose an Auto-Agent-Distiller (A2D) framework, which to our best knowledge is the first neural architecture search (NAS) applied to DRL to automatically search for the optimal DRL agents for various tasks that optimize both the test scores and efficiency. Specifically, we demonstrate that vanilla NAS can easily fail in searching for the optimal agents, due to its resulting high variance in DRL training stability, and then develop a novel distillation mechanism to distill the knowledge from both the teacher agent's actor and critic to stabilize the searching process and improve the searched agents' optimality. Extensive experiments and ablation studies consistently validate our findings and the advantages and general applicability of our A2D, outperforming manually designed DRL in both the test scores and efficiency. All the codes will be released upon acceptance.

Related papers

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning [50.02117478165099]
We show that large-scale reinforcement learning can significantly enhance the reasoning capabilities of strong, small- and mid-sized models.<n>We propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts.
arXiv Detail & Related papers (2025-05-22T08:50:47Z)
Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases [60.30995339585003]
Deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. DRL faces certain limitations, including low sample efficiency and poor generalization. We present how to leverage generative AI (GAI) to address these issues and enhance the performance of DRL algorithms.
arXiv Detail & Related papers (2024-05-31T01:25:40Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
Compressing Deep Reinforcement Learning Networks with a Dynamic Structured Pruning Method for Autonomous Driving [63.155562267383864]
Deep reinforcement learning (DRL) has shown remarkable success in complex autonomous driving scenarios. DRL models inevitably bring high memory consumption and computation, which hinders their wide deployment in resource-limited autonomous driving devices. We introduce a novel dynamic structured pruning approach that gradually removes a DRL model's unimportant neurons during the training stage.
arXiv Detail & Related papers (2024-02-07T09:00:30Z)
Testing of Deep Reinforcement Learning Agents with Surrogate Models [10.243488468625786]
Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. In this paper, we propose a search-based approach to test such agents.
arXiv Detail & Related papers (2023-05-22T06:21:39Z)
A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks [14.22330197686511]
Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. There is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks.
arXiv Detail & Related papers (2022-08-25T14:52:16Z)
A Search-Based Testing Approach for Deep Reinforcement Learning Agents [1.1580916951856255]
We propose a Search-based Testing Approach of Reinforcement Learning Agents (STARLA) to test the policy of a DRL agent. We use machine learning models and a dedicated genetic algorithm to narrow the search towards faulty episodes.
arXiv Detail & Related papers (2022-06-15T20:51:33Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
A3C-S: Automated Agent Accelerator Co-Search towards Efficient Deep Reinforcement Learning [16.96187187108041]
We propose an Automated Agent Accelerator Co-Search (A3C-S) framework, which to our best knowledge is the first to automatically co-search the optimally matched DRL agents and accelerators. Our experiments consistently validate the superiority of our A3C-S over state-of-the-art techniques.
arXiv Detail & Related papers (2021-06-11T18:56:44Z)
Maximum Entropy RL (Provably) Solves Some Robust RL Problems [94.80212602202518]
We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function. Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
arXiv Detail & Related papers (2021-03-10T18:45:48Z)
Efficient Reinforcement Learning Development with RLzoo [21.31425280231093]
Existing Deep Reinforcement Learning (DRL) libraries provide poor support for prototyping DRL agents. We introduce RLzoo, a new DRL library that aims to make the development of DRL agents efficient.
arXiv Detail & Related papers (2020-09-18T06:18:49Z)
Distributed Reinforcement Learning for Cooperative Multi-Robot Object Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL) We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL) Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.