Related papers: On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation

On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation

URL: http://arxiv.org/abs/2402.14052v1
Date: Wed, 21 Feb 2024 18:57:54 GMT
Title: On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation
Authors: Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang
Abstract summary: This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) With encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures with encoder-only PLMs.
Score: 76.52997424694767
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a performance comparison between in-domain encoder-only and encoder-decoder PLMs across varied resource settings. Our findings, derived from extensive experimentation in two domains reveal that with encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. Additionally, prefix-LM fine-tuning of encoder-only PLMs emerges as a strong and data-efficient strategy for KPG, outperforming general-domain seq2seq PLMs. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures initialized with encoder-only PLMs. The study sheds light on the potential of utilizing encoder-only PLMs for advancing KPG systems and provides a groundwork for future KPG methods. Our code and pre-trained checkpoints are released at https://github.com/uclanlp/DeepKPG.

Related papers

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation [52.19855651708349]
We study a novel problem: adapting decoder-only large language models to encoder-decoder models. We argue that adaptation not only enables inheriting the capability of decoder-only LLMs but also reduces the demand for computation. Under similar inference budget, encoder-decoder LLMs achieve comparable (often better) pretraining performance but substantially better finetuning performance than their decoder-only counterpart.
arXiv Detail & Related papers (2025-04-08T17:13:41Z)
Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks [24.674661807982865]
We introduce Gemma, adapting the powerful decoder model to an encoder architecture. To optimize the adaptation from decoder to encoder, we analyze various pooling strategies. We benchmark Gemma against established approaches on the GLUE benchmarks, and MS MARCO ranking benchmark.
arXiv Detail & Related papers (2025-03-04T14:17:00Z)
Return of the Encoder: Maximizing Parameter Efficiency for SLMs [4.246337121596753]
encoder-decoder architectures achieve 47% lower first-token latency and 4.7x higher throughput compared to decoder-only models on edge devices. We introduce a novel knowledge distillation framework that enables encoder-decoder models to leverage capabilities from large scalable decoder-only teachers.
arXiv Detail & Related papers (2025-01-27T18:06:36Z)
Threshold Selection for Iterative Decoding of $(v,w)$-regular Binary Codes [84.0257274213152]
Iterative bit flipping decoders are an efficient choice for sparse $(v,w)$-regular codes. We propose concrete criteria for threshold determination, backed by a closed form model.
arXiv Detail & Related papers (2025-01-23T17:38:22Z)
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? [32.338318300589776]
This study presents the first systematic exploration of decoder-only large language models for code search. We evaluate nine state-of-the-art decoder-only models using two fine-tuning methods, two datasets, and three model sizes. Our findings reveal that fine-tuned CodeGemma significantly outperforms encoder-only models like UniXcoder.
arXiv Detail & Related papers (2024-10-29T17:05:25Z)
How to get better embeddings with code pre-trained models? An empirical study [6.220333404184779]
We study five different code pre-trained models (PTMs) to generate embeddings for downstream classification tasks. We find that embeddings obtained through special tokens do not sufficiently aggregate the semantic information of the entire code snippet. The quality of code embeddings obtained by combing code data and text data in the same way as pre-training the PTMs is poor and cannot guarantee richer semantic information.
arXiv Detail & Related papers (2023-11-14T10:44:21Z)
Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models [76.52997424694767]
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread applications. Seq2seq pre-trained language models (PLMs) have ushered in a transformative era for KPG, yielding promising performance improvements. This paper undertakes a systematic analysis of the influence of model selection and decoding strategies on PLM-based KPG.
arXiv Detail & Related papers (2023-10-10T07:34:45Z)
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder [75.03283861464365]
The seq2seq task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by an encoder to encode the source sequence and a decoder to generate the target text. Recently, a bunch of new approaches have emerged that apply decoder-only language models directly to the seq2seq task.
arXiv Detail & Related papers (2023-04-08T15:44:29Z)
Machine Learning-Aided Efficient Decoding of Reed-Muller Subcodes [59.55193427277134]
Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels. RM codes only admit limited sets of rates. Efficient decoders are available for RM codes at finite lengths.
arXiv Detail & Related papers (2023-01-16T04:11:14Z)
Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models. We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance. Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z)
CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family [45.74928228858547]
We introduce a novel $textbfC$urtextbfRI$culum based $textbfS$equential neural decoder for $textbfP$olar codes (CRISP) We show that CRISP attains near-optimal reliability performance on the Polar(32,16) and Polar(64,22) codes. CRISP can be readily extended to Polarization-Adjusted-Convolutional (PAC) codes, where existing SC decoders are significantly less reliable.
arXiv Detail & Related papers (2022-10-01T16:26:24Z)
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring [83.32560748324667]
This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models. We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.
arXiv Detail & Related papers (2021-09-09T16:50:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.