Diverse Keyphrase Generation with Neural Unlikelihood Training
- URL: http://arxiv.org/abs/2010.07665v1
- Date: Thu, 15 Oct 2020 11:12:26 GMT
- Title: Diverse Keyphrase Generation with Neural Unlikelihood Training
- Authors: Hareesh Bahuleyan and Layla El Asri
- Abstract summary: We study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity.
We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE)
- Score: 6.645227801791013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study sequence-to-sequence (S2S) keyphrase generation
models from the perspective of diversity. Recent advances in neural natural
language generation have made possible remarkable progress on the task of
keyphrase generation, demonstrated through improvements on quality metrics such
as F1-score. However, the importance of diversity in keyphrase generation has
been largely ignored. We first analyze the extent of information redundancy
present in the outputs generated by a baseline model trained using maximum
likelihood estimation (MLE). Our findings show that repetition of keyphrases is
a major issue with MLE training. To alleviate this issue, we adopt neural
unlikelihood (UL) objective for training the S2S model. Our version of UL
training operates at (1) the target token level to discourage the generation of
repeating tokens; (2) the copy token level to avoid copying repetitive tokens
from the source text. Further, to encourage better model planning during the
decoding process, we incorporate K-step ahead token prediction objective that
computes both MLE and UL losses on future tokens as well. Through extensive
experiments on datasets from three different domains we demonstrate that the
proposed approach attains considerably large diversity gains, while maintaining
competitive output quality.
Related papers
- Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - Joint Repetition Suppression and Content Moderation of Large Language
Models [4.9990392459395725]
Natural language generation (NLG) is one of the most impactful fields in NLP.
In this paper, we apply non-exact repetition suppression using token and sequence level unlikelihood loss.
We also explore the framework of unlikelihood training objective in order to jointly endow the model with abilities to avoid generating offensive words.
arXiv Detail & Related papers (2023-04-20T19:17:49Z) - Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z) - A Simple Contrastive Learning Objective for Alleviating Neural Text
Degeneration [56.64703901898937]
We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training.
Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts.
arXiv Detail & Related papers (2022-05-05T08:50:50Z) - Enabling Multimodal Generation on CLIP via Vision-Language Knowledge
Distillation [79.72299298976525]
We propose to augment a vision-language pre-training model with a textual pre-trained language model (PLM) via vision-language knowledge distillation (VLKD)
Experiments show that the resulting model has strong zero-shot performance on multimodal generation tasks, such as open-ended visual question answering and image captioning.
The original textual language understanding and generation ability of the PLM is maintained after VLKD, which makes our model versatile for both multimodal and unimodal tasks.
arXiv Detail & Related papers (2022-03-12T09:33:37Z) - A Contrastive Framework for Neural Text Generation [46.845997620234265]
We show that an underlying reason for model degeneration is the anisotropic distribution of token representations.
We present a contrastive solution: (i) SimCTG, a contrastive training objective to calibrate the model's representation space, and (ii) a decoding method -- contrastive search -- to encourage diversity while maintaining coherence in the generated text.
arXiv Detail & Related papers (2022-02-13T21:46:14Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce.
Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.