Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study
- URL: http://arxiv.org/abs/2212.10233v2
- Date: Fri, 23 Feb 2024 04:45:40 GMT
- Title: Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study
- Authors: Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang
- Abstract summary: We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
- Score: 76.52997424694767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural models that do not rely on pre-training have excelled in the keyphrase
generation task with large annotated datasets. Meanwhile, new approaches have
incorporated pre-trained language models (PLMs) for their data efficiency.
However, there lacks a systematic study of how the two types of approaches
compare and how different design choices can affect the performance of
PLM-based models. To fill in this knowledge gap and facilitate a more informed
use of PLMs for keyphrase extraction and keyphrase generation, we present an
in-depth empirical study. Formulating keyphrase extraction as sequence labeling
and keyphrase generation as sequence-to-sequence generation, we perform
extensive experiments in three domains. After showing that PLMs have
competitive high-resource performance and state-of-the-art low-resource
performance, we investigate important design choices including in-domain PLMs,
PLMs with different pre-training objectives, using PLMs with a parameter
budget, and different formulations for present keyphrases. Further results show
that (1) in-domain BERT-like PLMs can be used to build strong and
data-efficient keyphrase generation models; (2) with a fixed parameter budget,
prioritizing model depth over width and allocating more layers in the encoder
leads to better encoder-decoder models; and (3) introducing four in-domain
PLMs, we achieve a competitive performance in the news domain and the
state-of-the-art performance in the scientific domain.
Related papers
- Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation [0.0]
The field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs)
These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains.
However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges.
arXiv Detail & Related papers (2024-04-19T02:59:09Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Large Language Models can Contrastively Refine their Generation for Better Sentence Representation Learning [57.74233319453229]
Large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task.
We propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus.
Our experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results.
arXiv Detail & Related papers (2023-10-17T03:21:43Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Diverse Keyphrase Generation with Neural Unlikelihood Training [6.645227801791013]
We study sequence-to-sequence (S2S) keyphrase generation models from the perspective of diversity.
We first analyze the extent of information redundancy present in the outputs generated by a baseline model trained using maximum likelihood estimation (MLE)
arXiv Detail & Related papers (2020-10-15T11:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.