Make The Most of Prior Data: A Solution for Interactive Text
Summarization with Preference Feedback
- URL: http://arxiv.org/abs/2204.05512v1
- Date: Tue, 12 Apr 2022 03:56:59 GMT
- Title: Make The Most of Prior Data: A Solution for Interactive Text
Summarization with Preference Feedback
- Authors: Duy-Hung Nguyen and Nguyen Viet Dung Nghiem and Bao-Sinh Nguyen and
Dung Tien Le and Shahab Sabahi and Minh-Tien Nguyen and Hung Le
- Abstract summary: We introduce a new framework to train summarization models with preference feedback interactively.
By properly leveraging offline data and a novel reward model, we improve the performance regarding ROUGE scores and sample-efficiency.
- Score: 15.22874706089491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For summarization, human preference is critical to tame outputs of the
summarizer in favor of human interests, as ground-truth summaries are scarce
and ambiguous. Practical settings require dynamic exchanges between human and
AI agent wherein feedback is provided in an online manner, a few at a time. In
this paper, we introduce a new framework to train summarization models with
preference feedback interactively. By properly leveraging offline data and a
novel reward model, we improve the performance regarding ROUGE scores and
sample-efficiency. Our experiments on three various datasets confirm the
benefit of the proposed framework in active, few-shot and online settings of
preference learning.
Related papers
- Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback [87.37721254914476]
We introduce a routing framework that combines inputs from humans and LMs to achieve better annotation quality.
We train a performance prediction model to predict a reward model's performance on an arbitrary combination of human and LM annotations.
We show that the selected hybrid mixture achieves better reward model performance compared to using either one exclusively.
arXiv Detail & Related papers (2024-10-24T20:04:15Z) - Model-based Preference Optimization in Abstractive Summarization without Human Feedback [5.438770095369458]
We introduce Model-based Preference Optimization (MPO) to fine-tune Large Language Models for improved summarization abilities without any human feedback.
Our experiments on standard summarization datasets and various metrics demonstrate that our proposed MPO significantly enhances the quality of generated summaries without relying on human feedback.
arXiv Detail & Related papers (2024-09-27T10:35:45Z) - Constructive Large Language Models Alignment with Diverse Feedback [76.9578950893839]
We introduce Constructive and Diverse Feedback (CDF) as a novel method to enhance large language models alignment.
We exploit critique feedback for easy problems, refinement feedback for medium problems, and preference feedback for hard problems.
By training our model with this diversified feedback, we achieve enhanced alignment performance while using less training data.
arXiv Detail & Related papers (2023-10-10T09:20:14Z) - Inverse Reinforcement Learning for Text Summarization [52.765898203824975]
We introduce inverse reinforcement learning (IRL) as an effective paradigm for training abstractive summarization models.
Experimental results across datasets in different domains demonstrate the superiority of our proposed IRL model for summarization over MLE and RL baselines.
arXiv Detail & Related papers (2022-12-19T23:45:05Z) - Incorporating Relevance Feedback for Information-Seeking Retrieval using
Few-Shot Document Re-Ranking [56.80065604034095]
We introduce a kNN approach that re-ranks documents based on their similarity with the query and the documents the user considers relevant.
To evaluate our different integration strategies, we transform four existing information retrieval datasets into the relevance feedback scenario.
arXiv Detail & Related papers (2022-10-19T16:19:37Z) - Dialogue Response Ranking Training with Large-Scale Human Feedback Data [52.12342165926226]
We leverage social media feedback data to build a large-scale training dataset for feedback prediction.
We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data.
Our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback.
arXiv Detail & Related papers (2020-09-15T10:50:05Z) - Learning to summarize from human feedback [18.964548137315333]
We show that it is possible to significantly improve summary quality by training a model to optimize for human preferences.
We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone.
Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning.
arXiv Detail & Related papers (2020-09-02T19:54:41Z) - Leveraging Historical Interaction Data for Improving Conversational
Recommender System [105.90963882850265]
We propose a novel pre-training approach to integrate item- and attribute-based preference sequence.
Experiment results on two real-world datasets have demonstrated the effectiveness of our approach.
arXiv Detail & Related papers (2020-08-19T03:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.