Align on the Fly: Adapting Chatbot Behavior to Established Norms
- URL: http://arxiv.org/abs/2312.15907v1
- Date: Tue, 26 Dec 2023 06:51:09 GMT
- Title: Align on the Fly: Adapting Chatbot Behavior to Established Norms
- Authors: Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu,
Jing Li, Jie Fu, Pengfei Liu
- Abstract summary: We propose an On-the-fly Preference Optimization (OPO) method, which is a real-time alignment that works in a streaming way.
Experimental results on both human-annotated and auto-generated questions from legal and moral domains indicate the effectiveness of the proposed OPO method.
- Score: 47.34022081652952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we aim to align large language models with the ever-changing,
complex, and diverse human values (e.g., social norms) across time and
locations. This presents a challenge to existing alignment techniques, such as
supervised fine-tuning, which internalize values within model parameters. To
overcome this, we propose an On-the-fly Preference Optimization (OPO) method,
which is a real-time alignment that works in a streaming way. It employs an
external memory to store established rules for alignment, which can constrain
LLMs' behaviors without further training, allowing for convenient updates and
customization of human values. We also introduce a scalable evaluation to
assess the proposed method more effectively. Experimental results on both
human-annotated and auto-generated questions from legal and moral domains
indicate the effectiveness of the proposed OPO method. Our code and data are
released at https://github.com/GAIR-NLP/OPO.
Related papers
- On-the-fly Preference Alignment via Principle-Guided Decoding [27.50204023448716]
We introduce On-the-fly Preference Alignment via Principle-Guided Decoding (OPAD) to align model outputs with human preferences during inference.
OPAD achieves competitive or superior performance in both general and personalized alignment tasks.
arXiv Detail & Related papers (2025-02-20T02:23:09Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets [19.485572131953937]
We propose a practical application of a diversity-seeking RL algorithm called GFlowNet-DPO (GDPO) in an offline preference alignment setting.
Empirical results show GDPO can generate far more diverse responses than the baseline methods.
arXiv Detail & Related papers (2024-10-19T13:07:52Z) - SAIL: Self-Improving Efficient Online Alignment of Large Language Models [56.59644677997827]
Reinforcement Learning from Human Feedback is a key method for aligning large language models with human preferences.
Recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation.
Our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.
arXiv Detail & Related papers (2024-06-21T18:05:35Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Token-level Direct Preference Optimization [8.249403373337024]
Fine-tuning pre-trained Large Language Models is essential to align them with human values and intentions.
We introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level.
arXiv Detail & Related papers (2024-04-18T08:49:38Z) - Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback [70.32795295142648]
Linear alignment is a novel algorithm that aligns language models with human preferences in one single inference step.
Experiments on both general and personalized preference datasets demonstrate that linear alignment significantly enhances the performance and efficiency of LLM alignment.
arXiv Detail & Related papers (2024-01-21T10:46:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.