Related papers: Preference Alignment with Flow Matching

Related papers

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training [9.093854840532062]
PITA is a novel framework that integrates preference feedback directly into the LLM's token generation.<n> PITA learns a small preference-based guidance policy to modify token probabilities at inference time without fine-tuning.<n>We evaluate PITA across diverse tasks, including mathematical reasoning and sentiment classification.
arXiv Detail & Related papers (2025-07-26T21:46:32Z)
Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z)
Smoothed Preference Optimization via ReNoise Inversion for Aligning Diffusion Models with Varied Human Preferences [13.588231827053923]
Direct Preference Optimization (DPO) aligns text-to-image (T2I) generation models with human preferences using pairwise preference data.<n>We propose SmPO-Diffusion, a novel method for modeling preference distributions to improve the DPO objective.<n>Our approach effectively mitigates issues of excessive optimization and objective misalignment present in existing methods.
arXiv Detail & Related papers (2025-06-03T09:47:22Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Self-NPO: Negative Preference Optimization of Diffusion Models by Simply Learning from Itself without Explicit Preference Annotations [60.143658714894336]
Diffusion models have demonstrated remarkable success in various visual generation tasks, including image, video, and 3D content generation.<n> Preference optimization (PO) is a prominent and growing area of research that aims to align these models with human preferences.<n>We introduce Self-NPO, a Negative Preference Optimization approach that learns exclusively from the model itself.
arXiv Detail & Related papers (2025-05-17T01:03:46Z)
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models [32.586647934400105]
We argue that existing preference alignment methods neglect the critical role of handling unconditional/negative-conditional outputs.<n>We propose a straightforward but versatile effective approach that involves training a model specifically attuned to negative preferences.<n>Our approach integrates seamlessly with models such as SD1.5, SDXL, video diffusion models and models that have undergone preference optimization.
arXiv Detail & Related papers (2025-05-16T13:38:23Z)
DPR: Diffusion Preference-based Reward for Offline Reinforcement Learning [30.654668373387214]
We propose a novel preference-based reward acquisition method: Diffusion Preference-based Reward (DPR) DPR uses diffusion models to directly model preference distributions for state-action pairs, allowing rewards to be discriminatively obtained from these distributions. We apply the above methods to existing offline reinforcement learning algorithms and a series of experiment results demonstrate that the diffusion-based reward acquisition approach outperforms previous-based and Transformer-based methods.
arXiv Detail & Related papers (2025-03-03T03:49:38Z)
Calibrated Multi-Preference Optimization for Aligning Diffusion Models [92.90660301195396]
Calibrated Preference Optimization (CaPO) is a novel method to align text-to-image (T2I) diffusion models. CaPO incorporates the general preference from multiple reward models without human annotated data. Experimental results show that CaPO consistently outperforms prior methods.
arXiv Detail & Related papers (2025-02-04T18:59:23Z)
Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks. In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters. We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z)
General Preference Modeling with Preference Representations for Aligning Language Models [51.14207112118503]
We introduce preference representation learning, an approach that embeds responses into a latent space to capture intricate preference structures efficiently. We also propose preference score-based General Preference Optimization (GPO), which generalizes reward-based reinforcement learning from human feedback. Our method may enhance the alignment of foundation models with nuanced human values.
arXiv Detail & Related papers (2024-10-03T04:22:55Z)
Aligning Large Language Models via Fine-grained Supervision [20.35000061196631]
Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback to improve model alignment. We propose a method to enhance LLM alignment through fine-grained token-level supervision.
arXiv Detail & Related papers (2024-06-04T20:21:45Z)
Preference Learning Algorithms Do Not Learn Preference Rankings [62.335733662381884]
We study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs. We find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets.
arXiv Detail & Related papers (2024-05-29T21:29:44Z)
Robust Preference Optimization through Reward Model Distillation [68.65844394615702]
Language model (LM) post-training involves maximizing a reward function that is derived from preference annotations. DPO is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning. We analyze this phenomenon and propose distillation to get a better proxy for the true preference distribution over generation pairs.
arXiv Detail & Related papers (2024-05-29T17:39:48Z)
Direct Preference Optimization With Unobserved Preference Heterogeneity [16.91835461818937]
This paper presents a new method to align generative models with varied human preferences. We propose an Expectation-Maximization adaptation to DPO, generating a mixture of models based on latent preference types of the annotators. Our algorithms leverage the simplicity of DPO while accommodating diverse preferences.
arXiv Detail & Related papers (2024-05-23T21:25:20Z)
Diffusion Model Alignment Using Direct Preference Optimization [103.2238655827797]
Diffusion-DPO is a method to align diffusion models to human preferences by directly optimizing on human comparison data. We fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO. We also develop a variant that uses AI feedback and has comparable performance to training on human preferences.
arXiv Detail & Related papers (2023-11-21T15:24:05Z)
FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning [21.693779973263172]
In this paper, we introduce a fine-tuning approach termed Feature Discrimination Alignment (FD-Align) Our method aims to bolster the model's generalizability by preserving the consistency of spurious features. Once fine-tuned, the model can seamlessly integrate with existing methods, leading to performance improvements.
arXiv Detail & Related papers (2023-10-23T17:12:01Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return. Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions. The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.