Trajectory-Class-Aware Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2503.01440v1
- Date: Mon, 03 Mar 2025 11:46:44 GMT
- Title: Trajectory-Class-Aware Multi-Agent Reinforcement Learning
- Authors: Hyungho Na, Kwanghyeon Lee, Sumin Lee, Il-Chul Moon,
- Abstract summary: We introduce TRajectory-class-Aware Multi-Agent reinforcement learning (TRAMA)<n>In TRAMA, agents recognize a task type by identifying the class of trajectories they are experiencing through partial observations.<n>We introduce a trajectory-class predictor that performs agent-wise predictions on the trajectory class.
- Score: 10.230156872997874
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the context of multi-agent reinforcement learning, generalization is a challenge to solve various tasks that may require different joint policies or coordination without relying on policies specialized for each task. We refer to this type of problem as a multi-task, and we train agents to be versatile in this multi-task setting through a single training process. To address this challenge, we introduce TRajectory-class-Aware Multi-Agent reinforcement learning (TRAMA). In TRAMA, agents recognize a task type by identifying the class of trajectories they are experiencing through partial observations, and the agents use this trajectory awareness or prediction as additional information for action policy. To this end, we introduce three primary objectives in TRAMA: (a) constructing a quantized latent space to generate trajectory embeddings that reflect key similarities among them; (b) conducting trajectory clustering using these trajectory embeddings; and (c) building a trajectory-class-aware policy. Specifically for (c), we introduce a trajectory-class predictor that performs agent-wise predictions on the trajectory class; and we design a trajectory-class representation model for each trajectory class. Each agent takes actions based on this trajectory-class representation along with its partial observation for task-aware execution. The proposed method is evaluated on various tasks, including multi-task problems built upon StarCraft II. Empirical results show further performance improvements over state-of-the-art baselines.
Related papers
- Making Universal Policies Universal [21.558271405324767]
We build on the universal policy framework, which decouples policy learning into two stages.<n>We propose a method for training the planner on a joint dataset composed of trajectories from all agents.<n>By training on a pooled dataset from multiple agents, our universal policy achieves an improvement of up to $42.20%$ in task completion accuracy.
arXiv Detail & Related papers (2025-02-20T17:59:55Z) - Solving Continual Offline RL through Selective Weights Activation on Aligned Spaces [52.649077293256795]
Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based lifelong learning systems.
We propose Vector-Quantized Continual diffuser, named VQ-CD, to break the barrier of different spaces between various tasks.
arXiv Detail & Related papers (2024-10-21T07:13:45Z) - Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping [9.81076530822611]
We propose a method that learns a subgoal mapping between the expert agent policy and the learner agent policy.
We learn this subgoal mapping by training a Long Short Term Memory (LSTM) network for a distribution of tasks.
We demonstrate that the proposed learning scheme can effectively find the subgoal mapping underlying the given distribution of tasks.
arXiv Detail & Related papers (2024-10-18T14:08:41Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models [22.472167814814448]
We propose a new model-based imitation learning algorithm named Separated Model-based Adversarial Imitation Learning (SeMAIL)
Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations.
arXiv Detail & Related papers (2023-06-19T04:33:44Z) - Emergent Behaviors in Multi-Agent Target Acquisition [0.0]
We simulate a Multi-Agent System (MAS) using Reinforcement Learning (RL) in a pursuit-evasion game.
We create different adversarial scenarios by replacing RL-trained pursuers' policies with two distinct (non-RL) analytical strategies.
The novelty of our approach entails the creation of an influential feature set that reveals underlying data regularities.
arXiv Detail & Related papers (2022-12-15T15:20:58Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Towards Discriminative Representation: Multi-view Trajectory Contrastive
Learning for Online Multi-object Tracking [1.0474108328884806]
We propose a strategy, namely multi-view trajectory contrastive learning, in which each trajectory is represented as a center vector.
In the inference stage, a similarity-guided feature fusion strategy is developed for further boosting the quality of the trajectory representation.
Our method has surpassed preceding trackers and established new state-of-the-art performance.
arXiv Detail & Related papers (2022-03-27T04:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.