SimOAP: Improve Coherence and Consistency in Persona-based Dialogue
Generation via Over-sampling and Post-evaluation
- URL: http://arxiv.org/abs/2305.11130v2
- Date: Sat, 20 May 2023 06:30:01 GMT
- Title: SimOAP: Improve Coherence and Consistency in Persona-based Dialogue
Generation via Over-sampling and Post-evaluation
- Authors: Junkai Zhou, Liang Pang, Huawei Shen, Xueqi Cheng
- Abstract summary: Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue.
For the persona-based dialogue generation task, consistency and coherence are great challenges for language models.
A two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation.
- Score: 54.66399120084227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language models trained on large-scale corpora can generate remarkably fluent
results in open-domain dialogue. However, for the persona-based dialogue
generation task, consistency and coherence are also key factors, which are
great challenges for language models. Existing works mainly focus on valuable
data filtering, model structure modifying, or objective function designing,
while their improvements are limited and hard to generalize to all types of
pre-trained language models. However, we find that language models can produce
consistent and coherent responses if we consider enough generations. Thus, the
problems lay in large-scale response generation and target response selection.
In this work, a simple but effective two-stage SimOAP strategy is proposed,
i.e., over-sampling and post-evaluation. The over-sampling stage takes
large-scale responses from existing trained models efficiently via
off-the-shelf distilling and compressing methods, and the post-evaluation stage
selects a good response based on multiple well-designed evaluation metrics from
large-scale candidates. Experimental results show that the proposed plug-in
SimOAP strategy improves the backbone models and outperforms the baseline
strategies in both automatic and human evaluations.
Related papers
- Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of
Pre-trained Language Models with Proximal Policy Optimization [18.75866961339424]
ChatGPT has highlighted the potential of reinforcement learning from human feedback.
To reduce labor costs, we propose a self-supervised text ranking approach.
arXiv Detail & Related papers (2024-02-28T12:24:07Z) - Split and Rephrase with Large Language Models [2.499907423888049]
Split and Rephrase (SPRP) task consists in splitting complex sentences into a sequence of shorter grammatical sentences.
We evaluate large language models on the task, showing that they can provide large improvements over the state of the art on the main metrics.
arXiv Detail & Related papers (2023-12-18T10:16:37Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Differentiable Retrieval Augmentation via Generative Language Modeling
for E-commerce Query Intent Classification [8.59563091603226]
We propose Differentiable Retrieval Augmentation via Generative lANguage modeling(Dragan) to address this problem by a novel differentiable reformulation.
We demonstrate the effectiveness of our proposed method on a challenging NLP task in e-commerce search, namely query intent classification.
arXiv Detail & Related papers (2023-08-18T05:05:35Z) - Learning Evaluation Models from Large Language Models for Sequence
Generation [44.22820310679188]
Large language models achieve state-of-the-art performance on sequence generation evaluation, but typically have a large number of parameters.
We propose textbfECT, an textbfevaluation textbfcapability textbftransfer method, to transfer the evaluation capability from LLMs to relatively lightweight language models.
Based on the proposed ECT, we learn various evaluation models from ChatGPT, and employ them as reward models to improve sequence generation models.
arXiv Detail & Related papers (2023-08-08T16:41:16Z) - mFACE: Multilingual Summarization with Factual Consistency Evaluation [79.60172087719356]
Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-trained language models and the availability of large-scale datasets.
Despite promising results, current models still suffer from generating factually inconsistent summaries.
We leverage factual consistency evaluation models to improve multilingual summarization.
arXiv Detail & Related papers (2022-12-20T19:52:41Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - Dynamic Data Selection and Weighting for Iterative Back-Translation [116.14378571769045]
We propose a curriculum learning strategy for iterative back-translation models.
We evaluate our models on domain adaptation, low-resource, and high-resource MT settings.
Experimental results demonstrate that our methods achieve improvements of up to 1.8 BLEU points over competitive baselines.
arXiv Detail & Related papers (2020-04-07T19:49:58Z) - Prototype-to-Style: Dialogue Generation with Style-Aware Editing on
Retrieval Memory [65.98002918470543]
We introduce a new prototype-to-style framework to tackle the challenge of stylistic dialogue generation.
The framework uses an Information Retrieval (IR) system and extracts a response prototype from the retrieved response.
A stylistic response generator then takes the prototype and the desired language style as model input to obtain a high-quality and stylistic response.
arXiv Detail & Related papers (2020-04-05T14:36:15Z) - An Empirical Investigation of Pre-Trained Transformer Language Models
for Open-Domain Dialogue Generation [23.343006562849126]
We present an empirical investigation of pre-trained Transformer-based auto-regressive language models for the task of open-domain dialogue generation.
Training paradigm of pre-training and fine-tuning is employed to conduct learning.
Experiments are conducted on the typical single-turn and multi-turn dialogue corpora such as Weibo, Douban, Reddit, DailyDialog, and Persona-Chat.
arXiv Detail & Related papers (2020-03-09T15:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.