Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
- URL: http://arxiv.org/abs/2404.04920v1
- Date: Sun, 7 Apr 2024 11:20:32 GMT
- Title: Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
- Authors: Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li,
- Abstract summary: Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.
Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions.
In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making.
- Score: 43.86042557447689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks. Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions. Nevertheless, the return-conditioned paradigm relies on pre-defined reward functions, facing challenges when applied in multi-task settings characterized by varying reward functions (versatility) and showing limited controllability concerning human preferences (alignment). In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making, and propose preference representations aligned with preference labels. The learned representations are used to guide the conditional generation process of diffusion models, and we introduce an auxiliary objective to maximize the mutual information between representations and corresponding generated trajectories, improving alignment between trajectories and preferences. Extensive experiments in D4RL and Meta-World demonstrate that our method presents favorable performance in single- and multi-task scenarios, and exhibits superior alignment with preferences.
Related papers
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
Model merging is an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into a multitask model.
Existing model-merging methods focus on enhancing average task accuracy.
We introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction.
For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation.
Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z) - Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment [46.44464839353993]
We introduce Rewards-in-Context (RiC), which conditions the response of a foundation model on multiple rewards in its prompt context.
RiC only requires supervised fine-tuning of a single foundation model and supports dynamic adjustment for user preferences during inference time.
arXiv Detail & Related papers (2024-02-15T18:58:31Z) - AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable
Diffusion Model [69.12623428463573]
AlignDiff is a novel framework to quantify human preferences, covering abstractness, and guide diffusion planning.
It can accurately match user-customized behaviors and efficiently switch from one to another.
We demonstrate its superior performance on preference matching, switching, and covering compared to other baselines.
arXiv Detail & Related papers (2023-10-03T13:53:08Z) - Towards Flexible Inference in Sequential Decision Problems via
Bidirectional Transformers [17.09745648221254]
We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks.
A single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models.
arXiv Detail & Related papers (2022-04-28T07:50:08Z) - Multi-Order Networks for Action Unit Detection [7.971065005161565]
Multi-Order Network (MONET) is a multi-task learning method with joint task order optimization.
We show that MONET significantly extends state-of-the-art performance in Facial Action Unit detection.
arXiv Detail & Related papers (2022-02-01T14:58:21Z) - Abstractive Sentence Summarization with Guidance of Selective Multimodal
Reference [3.505062507621494]
We propose a Multimodal Hierarchical Selective Transformer (mhsf) model that considers reciprocal relationships among modalities.
We evaluate the generalism of proposed mhsf model with the pre-trained+fine-tuning and fresh training strategies.
arXiv Detail & Related papers (2021-08-11T09:59:34Z) - Conditional Generative Modeling via Learning the Latent Space [54.620761775441046]
We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
arXiv Detail & Related papers (2020-10-07T03:11:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.