ULMA: Unified Language Model Alignment with Human Demonstration and
Point-wise Preference
- URL: http://arxiv.org/abs/2312.02554v2
- Date: Mon, 26 Feb 2024 08:51:03 GMT
- Title: ULMA: Unified Language Model Alignment with Human Demonstration and
Point-wise Preference
- Authors: Tianchi Cai, Xierui Song, Jiyan Jiang, Fei Teng, Jinjie Gu, Guannan
Zhang
- Abstract summary: A typical alignment procedure consists of supervised fine-tuning and preference learning.
We introduce Point-wise Direct Preference Optimization, a novel preference learning method designed to harness point-wise feedback effectively.
Our work also uncovers a novel connection between supervised fine-tuning and point-wise preference learning, culminating in Unified Language Model Alignment.
- Score: 16.73260713938154
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Aligning language models to human expectations, e.g., being helpful and
harmless, has become a pressing challenge for large language models. A typical
alignment procedure consists of supervised fine-tuning and preference learning.
Most preference learning methods, such as RLHF and DPO, depend on pairwise
preference data, which inadequately address scenarios where human feedback is
point-wise, leading to potential information loss and suboptimal performance.
Addressing this gap, we introduce Point-wise Direct Preference Optimization, a
novel preference learning method designed to harness point-wise feedback
effectively. Our work also uncovers a novel connection between supervised
fine-tuning and point-wise preference learning, culminating in Unified Language
Model Alignment, a single-step method that unifies the alignment with human
demonstrations and point-wise preferences. Extensive experiments on point-wise
preference datasets with binary or continuous labels validate the effectiveness
of our methods. Our code and a new dataset with high-quality demonstration
samples on harmlessness are released.
Related papers
- Aligning Large Language Models with Self-generated Preference Data [72.99676237703099]
We propose a new framework that boosts the alignment of large language models (LLMs) with human preferences.
Our key idea is leveraging the human prior knowledge within the small (seed) data.
We introduce a noise-aware preference learning algorithm to mitigate the risk of low quality within generated preference data.
arXiv Detail & Related papers (2024-06-06T18:01:02Z) - Latent Distance Guided Alignment Training for Large Language Models [0.0]
In pursuit of improved alignment without relying on external annotation, we introduce Latent Distance Guided Alignment Training (LD-Align)
This approach seeks to align the model with a high-quality supervised fine-tune dataset using guidance from a latent space.
We utilize the distance between sample pairs in the latent space to guide DPO-based alignment training.
arXiv Detail & Related papers (2024-04-09T15:33:09Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Active Preference Learning for Large Language Models [12.093302163058436]
We develop an active learning strategy for DPO to make better use of preference labels.
We propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model.
We demonstrate how our approach improves both the rate of learning and final performance of fine-tuning on pairwise preference data.
arXiv Detail & Related papers (2024-02-12T23:09:00Z) - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step.
Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z) - Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment [105.34140537748546]
We propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained quality signals that are derived by contrasting good and bad responses.
Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding revised ones.
Secondly, we devise a new loss function can leverage fine-grained quality signals to instruct the learning of LLMs for alignment.
arXiv Detail & Related papers (2023-11-07T15:36:40Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - Aligning Language Models with Offline Learning from Human Feedback [5.539080592071948]
We propose an offline learning from human feedback framework to align language models without interacting with environments.
Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences.
arXiv Detail & Related papers (2023-08-23T10:41:07Z) - Chain of Hindsight Aligns Language Models with Feedback [62.68665658130472]
We propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity.
We convert all types of feedback into sequences of sentences, which are then used to fine-tune the model.
By doing so, the model is trained to generate outputs based on feedback, while learning to identify and correct negative attributes or errors.
arXiv Detail & Related papers (2023-02-06T10:28:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.