Aligning Large Language Models through Synthetic Feedback
- URL: http://arxiv.org/abs/2305.13735v2
- Date: Sat, 21 Oct 2023 01:50:54 GMT
- Title: Aligning Large Language Models through Synthetic Feedback
- Authors: Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak,
Kang Min Yoo, Minjoon Seo
- Abstract summary: We propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations.
In human evaluation, our model is preferred to Alpaca and Dolly-v2, 55.0% and 58.5% of the time, respectively.
- Score: 43.84431341195111
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Aligning large language models (LLMs) to human values has become increasingly
important as it enables sophisticated steering of LLMs. However, it requires
significant human demonstrations and feedback or distillation from proprietary
LLMs such as ChatGPT. In this work, we propose a novel alignment learning
framework with synthetic feedback not dependent on extensive human annotations
and proprietary LLMs. First, we perform reward modeling (RM) with synthetic
feedback by contrasting responses from vanilla LLMs with various sizes and
prompts. Then, we use the RM to simulate high-quality demonstrations to train a
supervised policy and further optimize the model with reinforcement learning.
Our resulting model, Aligned Language Model with Synthetic Training dataset
(ALMoST), outperforms recent open-sourced models, which are trained on the
outputs of InstructGPT or human-annotated demonstrations, in alignment
benchmarks. In human evaluation, our model is preferred to Alpaca and Dolly-v2,
55.0% and 58.5% of the time, respectively. Further analyses demonstrate the
efficacy and importance of synthetic feedback in our framework. The code is
available at https://github.com/naver-ai/almost
Related papers
- Sparse Rewards Can Self-Train Dialogue Agents [22.799506097310008]
We introduce a novel self-improvement paradigm that empowers LLM agents to autonomously enhance their performance without external human feedback.
We present ToolWOZ, a sparse reward tool-calling simulation environment derived from MultiWOZ.
We demonstrate that models trained with JOSH, both small and frontier, significantly improve tool-based interactions while preserving general model capabilities across diverse benchmarks.
arXiv Detail & Related papers (2024-09-06T21:00:57Z) - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings [16.28853186016663]
We create synthetic image-text pairs for efficient and effective Visual-Language Models (VLMs) training.
Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM.
Our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data.
arXiv Detail & Related papers (2024-03-12T15:36:42Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - SALMON: Self-Alignment with Instructable Reward Models [80.83323636730341]
This paper presents a novel approach, namely SALMON, to align base language models with minimal human supervision.
We develop an AI assistant named Dromedary-2 with only 6 exemplars for in-context learning and 31 human-defined principles.
arXiv Detail & Related papers (2023-10-09T17:56:53Z) - The False Promise of Imitating Proprietary LLMs [158.65692029352584]
An emerging method to cheaply improve a weaker language model is to finetune it on outputs from a stronger model.
This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.
We first finetune a series of LMs that imitate ChatGPT using varying base model sizes.
We then evaluate the models using crowd raters and canonical NLP benchmarks.
arXiv Detail & Related papers (2023-05-25T05:00:12Z) - On Learning to Summarize with Large Language Models as References [101.79795027550959]
Large language models (LLMs) are favored by human annotators over the original reference summaries in commonly used summarization datasets.
We study an LLM-as-reference learning setting for smaller text summarization models to investigate whether their performance can be substantially improved.
arXiv Detail & Related papers (2023-05-23T16:56:04Z) - Benchmarking Large Language Models for News Summarization [79.37850439866938]
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
We find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability.
arXiv Detail & Related papers (2023-01-31T18:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.