Multi-Agent Path Finding via Tree LSTM
- URL: http://arxiv.org/abs/2210.12933v1
- Date: Mon, 24 Oct 2022 03:22:20 GMT
- Title: Multi-Agent Path Finding via Tree LSTM
- Authors: Yuhao Jiang, Kunjie Zhang, Qimai Li, Jiaxin Chen, Xiaolong Zhu
- Abstract summary: In the 2021 Flatland3 Challenge, a competition on MAPF, the best RL method scored only 27.9, far less than the best OR method.
This paper proposes a new RL solution to Flatland3 Challenge, which scores 125.3, several times higher than the best RL solution before.
- Score: 17.938710696964662
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Multi-Agent Path Finding (MAPF) has attracted attention from
the fields of both Operations Research (OR) and Reinforcement Learning (RL).
However, in the 2021 Flatland3 Challenge, a competition on MAPF, the best RL
method scored only 27.9, far less than the best OR method. This paper proposes
a new RL solution to Flatland3 Challenge, which scores 125.3, several times
higher than the best RL solution before. We creatively apply a novel network
architecture, TreeLSTM, to MAPF in our solution. Together with several other RL
techniques, including reward shaping, multiple-phase training, and centralized
control, our solution is comparable to the top 2-3 OR methods.
Related papers
- Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO)
Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses.
With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z) - Scaling Lifelong Multi-Agent Path Finding to More Realistic Settings: Research Challenges and Opportunities [44.292720085661585]
We present our winning approach to the 2023 League of Robot Runners LMAPF competition.
The first challenge is to search for high-quality LMAPF solutions within a limited planning time.
The second challenge is to alleviate myopic congestion and the effect of behaviors in LMAPF algorithms.
The third challenge is to bridge the gaps between the LMAPF models used in the literature and real-world applications.
arXiv Detail & Related papers (2024-04-24T19:37:18Z) - An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems [19.443149691831856]
Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction.
Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry.
In this paper, we propose a novel method named IntegratedRL-MTF customized for MTF in large-scale RSs.
arXiv Detail & Related papers (2024-04-19T08:43:03Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Retrospective on the 2021 BASALT Competition on Learning from Human
Feedback [92.37243979045817]
The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks.
Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft.
Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types.
arXiv Detail & Related papers (2022-04-14T17:24:54Z) - Learning to Prune Deep Neural Networks via Reinforcement Learning [64.85939668308966]
PuRL is a deep reinforcement learning based algorithm for pruning neural networks.
It achieves sparsity and accuracy comparable to current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-09T13:06:07Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.