Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
- URL: http://arxiv.org/abs/2404.04920v2
- Date: Thu, 10 Oct 2024 10:05:43 GMT
- Title: Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
- Authors: Xudong Yu, Chenjia Bai, Haoran He, Changhong Wang, Xuelong Li,
- Abstract summary: Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks.
Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions.
In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making.
- Score: 43.86042557447689
- License:
- Abstract: Sequential decision-making is desired to align with human intents and exhibit versatility across various tasks. Previous methods formulate it as a conditional generation process, utilizing return-conditioned diffusion models to directly model trajectory distributions. Nevertheless, the return-conditioned paradigm relies on pre-defined reward functions, facing challenges when applied in multi-task settings characterized by varying reward functions (versatility) and showing limited controllability concerning human preferences (alignment). In this work, we adopt multi-task preferences as a unified condition for both single- and multi-task decision-making, and propose preference representations aligned with preference labels. The learned representations are used to guide the conditional generation process of diffusion models, and we introduce an auxiliary objective to maximize the mutual information between representations and corresponding generated trajectories, improving alignment between trajectories and preferences. Extensive experiments in D4RL and Meta-World demonstrate that our method presents favorable performance in single- and multi-task scenarios, and exhibits superior alignment with preferences.
Related papers
- On-the-fly Preference Alignment via Principle-Guided Decoding [27.50204023448716]
We introduce On-the-fly Preference Alignment via Principle-Guided Decoding (OPAD) to align model outputs with human preferences during inference.
OPAD achieves competitive or superior performance in both general and personalized alignment tasks.
arXiv Detail & Related papers (2025-02-20T02:23:09Z) - Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation [0.0]
Diffusion-based models are recognized for their effectiveness in using real-world driving data to generate realistic traffic scenarios.
These models employ guided sampling to incorporate specific traffic preferences and enhance scenario realism.
We introduce a multi-guided diffusion model that utilizes a novel training strategy to closely adhere to traffic priors.
arXiv Detail & Related papers (2025-02-14T05:29:43Z) - Calibrated Multi-Preference Optimization for Aligning Diffusion Models [92.90660301195396]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models.
CaPO incorporates the general preference from multiple reward models without human annotated data.
Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z) - One Fits All: General Mobility Trajectory Modeling via Masked Conditional Diffusion [11.373845190033297]
Trajectory data play a crucial role in many applications, ranging from network optimization to urban planning.
Existing studies on trajectory data are task-specific, and their applicability is limited to the specific tasks on which they have been trained, such as generation, recovery, or prediction.
We propose a general trajectory modeling framework via conditional diffusion (named GenMove)
Our model significantly outperforms state-of-the-art baselines, with the highest performance exceeding 13% improvement in generation tasks.
arXiv Detail & Related papers (2025-01-23T03:13:45Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Test-time Alignment of Diffusion Models without Reward Over-optimization [8.981605934618349]
Diffusion models excel in generative tasks, but aligning them with specific objectives remains challenging.
We propose a training-free, test-time method based on Sequential Monte Carlo (SMC) to sample from the reward-aligned target distribution.
We demonstrate its effectiveness in single-reward optimization, multi-objective scenarios, and online black-box optimization.
arXiv Detail & Related papers (2025-01-10T09:10:30Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion [53.90516061351706]
We present InterHandGen, a novel framework that learns the generative prior of two-hand interaction.
For sampling, we combine anti-penetration and synthesis-free guidance to enable plausible generation.
Our method significantly outperforms baseline generative models in terms of plausibility and diversity.
arXiv Detail & Related papers (2024-03-26T06:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.