ICDPO: Effectively Borrowing Alignment Capability of Others via
In-context Direct Preference Optimization
- URL: http://arxiv.org/abs/2402.09320v1
- Date: Wed, 14 Feb 2024 17:14:34 GMT
- Title: ICDPO: Effectively Borrowing Alignment Capability of Others via
In-context Direct Preference Optimization
- Authors: Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang
- Abstract summary: Large Language Models rely on Human Preference Alignment to ensure the generation of safe content.
We propose a novel approach called In-Context Direct Preference Optimization (ICDPO)
ICDPO generates well-aligned responses as estimated by the aforementioned instant scorer, thereby enhancing the final performance.
- Score: 24.55845271377532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) rely on Human Preference Alignment (HPA) to
ensure the generation of safe content. Due to the heavy cost associated with
fine-tuning, fine-tuning-free methods have emerged, typically modifying LLM
decoding with external auxiliary methods. However, these methods do not
essentially enhance the LLM itself. In this paper, we rethink the derivation
procedures of DPO, based on which we conversely build an instant scorer using
the states of the LLM before and after In-context Learning (ICL). Accordingly,
we propose a novel approach called In-Context Direct Preference Optimization
(ICDPO). It enables LLMs to borrow the HPA capabilities from superior LLMs with
ICL, generating well-aligned responses as estimated by the aforementioned
instant scorer, thereby enhancing the final performance. ICDPO can be further
enhanced with a two-stage retriever and an upgraded scorer, both offering
benefits. Extensive experiments show its effectiveness, particularly in
outperforming two fine-tuning-free baselines, and it exhibits competitiveness
with SFT + LoRA. We also conduct detailed analyses to offer comprehensive
insights into ICDPO.
Related papers
- Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models [44.38073745307387]
This work studies the challenge of aligning large language models (LLMs) with offline preference data.
We propose SPAC, a new offline preference optimization method with self-play, inspired by the on-average pessimism technique from the offline RL literature.
arXiv Detail & Related papers (2024-06-06T17:23:49Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives [0.5120567378386615]
We propose a hybrid approach to aligning large language models (LLMs)
With a simple augmentation to the implicit reward decomposition of DPO, we allow for tuning LLMs to maximize a set of arbitrary auxiliary rewards.
The proposed method, Hybrid Preference Optimization (HPO), shows the ability to effectively generalize to both user preferences and auxiliary designer objectives.
arXiv Detail & Related papers (2024-05-28T08:35:48Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - Self-Play Preference Optimization for Language Model Alignment [75.83359213697854]
Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences.
We propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game.
Our approach, dubbed Self-Play Preference Optimization (SPPO), approximates the Nash equilibrium through iterative policy updates.
arXiv Detail & Related papers (2024-05-01T17:59:20Z) - Weak-to-Strong Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to boost models' alignment with human preference.
We demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models.
We shed light on the essence of ExPO amplifying the reward signal learned during alignment training.
arXiv Detail & Related papers (2024-04-25T17:39:50Z) - Token-level Direct Preference Optimization [8.249403373337024]
Fine-tuning pre-trained Large Language Models is essential to align them with human values and intentions.
We introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level.
arXiv Detail & Related papers (2024-04-18T08:49:38Z) - Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model [3.300814846990438]
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language.
As they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values.
This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO)
By analyzing the stability and robustness of RLHF and DPO, we propose MPO, a novel method that mitigates the weaknesses of both approaches.
arXiv Detail & Related papers (2024-03-28T14:15:10Z) - Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts.
RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.