Related papers: Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

URL: http://arxiv.org/abs/2402.11907v1
Date: Mon, 19 Feb 2024 07:46:40 GMT
Title: Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation
Authors: Aiwei Liu, Haoping Bai, Zhiyun Lu, Xiang Kong, Simon Wang, Jiulong Shan, Meng Cao, Lijie Wen
Abstract summary: We propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs. Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA) In the experimental stage, our DLMA method could surpass the textttRLHF method without relying on human-annotated preference data.
Score: 47.16091219929373
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA). First, we use contrastive prompt pairs to automatically generate preference data. Then, we continue to evaluate the generated preference data using contrastive prompt pairs and calculate a self-rewarding score. Finally, we use the DPO algorithm to effectively align LLMs by combining this self-rewarding score. In the experimental stage, our DLMA method could surpass the \texttt{RLHF} method without relying on human-annotated preference data.

Related papers

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality. We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations. We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z)
REAL: Response Embedding-based Alignment for LLMs [1.9513983244114355]
We propose a strategy for sampling a high-quality training dataset that focuses on acquiring the most informative response pairs. Experimental results indicate that choosing dissimilar response pairs enhances the direct alignment of LLMs. Our findings suggest that focusing on less similar pairs can improve the efficiency of LLM alignment, saving up to 65% of annotators' work.
arXiv Detail & Related papers (2024-09-17T22:40:54Z)
Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences. Our key idea is leveraging the human prior knowledge within the small (seed) data. We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z)
Aligning Language Models with Demonstrated Feedback [58.834937450242975]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z)
Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data. Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z)
Offline Regularised Reinforcement Learning for Large Language Models Alignment [33.483481840098925]
We propose DRO, or emphDirect Reward optimisation, as a framework and associated algorithms. DRO uses a simple mean-squared objective that can be implemented in various ways.
arXiv Detail & Related papers (2024-05-29T14:11:29Z)
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization [105.3612692153615]
A common technique for aligning large language models (LLMs) relies on acquiring human preferences. We propose a new axis that is based on eliciting preferences jointly over the instruction-response pairs. We find that joint preferences over instruction and response pairs can significantly enhance the alignment of LLMs.
arXiv Detail & Related papers (2024-03-31T02:05:40Z)
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization [25.290462963681257]
Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. They often suffer from a bias towards generating responses similar to their pretraining corpus, overshadowing the importance of visual information. We treat this bias as a "preference" for pretraining statistics, which hinders the model's grounding in visual input.
arXiv Detail & Related papers (2024-03-13T17:29:45Z)
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts [95.09994361995389]
Relative Preference Optimization (RPO) is designed to discern between more and less preferred responses derived from both identical and related prompts. RPO has demonstrated a superior ability to align large language models with user preferences and to improve their adaptability during the training process.
arXiv Detail & Related papers (2024-02-12T22:47:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.