Demonstration Guided Multi-Objective Reinforcement Learning
- URL: http://arxiv.org/abs/2404.03997v1
- Date: Fri, 5 Apr 2024 10:19:04 GMT
- Title: Demonstration Guided Multi-Objective Reinforcement Learning
- Authors: Junlin Lu, Patrick Mannion, Karl Mason,
- Abstract summary: We introduce demonstration-guided multi-objective reinforcement learning (DG-MORL)
This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations.
Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy.
- Score: 2.9845592719739127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.
Related papers
- Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Supervised Fine-Tuning as Inverse Reinforcement Learning [8.044033685073003]
The prevailing approach to aligning Large Language Models (LLMs) typically relies on human or AI feedback.
In our work, we question the efficacy of such datasets and explore various scenarios where alignment with expert demonstrations proves more realistic.
arXiv Detail & Related papers (2024-03-18T17:52:57Z) - Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - Text-centric Alignment for Multi-Modality Learning [3.6961400222746748]
We propose the Text-centric Alignment for Multi-Modality Learning (TAMML) approach.
By leveraging the unique properties of text as a unified semantic space, TAMML demonstrates significant improvements in handling unseen, diverse, and unpredictable modality combinations.
This study contributes to the field by offering a flexible, effective solution for real-world applications where modality availability is dynamic and uncertain.
arXiv Detail & Related papers (2024-02-12T22:07:43Z) - Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning [68.94230363140771]
Mixture of Cluster-conditional LoRA Experts (MoCLE)
MoCLE is a novel Mixture of Experts architecture designed to activate the task-customized model parameters based on the instruction clusters.
Experiments on InstructBLIP and LLaVA demonstrate the effectiveness of MoCLE.
arXiv Detail & Related papers (2023-12-19T18:11:19Z) - ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning [22.733348449818838]
Multi-agent reinforcement learning (MARL) has achieved promising results in recent years.
This paper proposes a framework for exploiting prior knowledge by integrating data augmentation and a well-designed consistency loss.
arXiv Detail & Related papers (2023-07-30T09:49:05Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced
Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model.
We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms.
Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z) - How to Sense the World: Leveraging Hierarchy in Multimodal Perception
for Robust Reinforcement Learning Agents [9.840104333194663]
We argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE.
MUSE is the sensory representation model of deep reinforcement learning agents provided with multimodal observations in Atari games.
We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss.
arXiv Detail & Related papers (2021-10-07T16:35:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.