Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training
- URL: http://arxiv.org/abs/2205.10471v1
- Date: Sat, 21 May 2022 00:45:21 GMT
- Title: Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training
- Authors: Yifan Gao, Qingyu Yin, Zheng Li, Rui Meng, Tong Zhao, Bing Yin, Irwin
King, Michael R. Lyu
- Abstract summary: Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
- Score: 66.64843711515341
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Keyphrase generation is the task of automatically predicting keyphrases given
a piece of long text. Despite its recent flourishing, keyphrase generation on
non-English languages haven't been vastly investigated. In this paper, we call
attention to a new setting named multilingual keyphrase generation and we
contribute two new datasets, EcommerceMKP and AcademicMKP, covering six
languages. Technically, we propose a retrieval-augmented method for
multilingual keyphrase generation to mitigate the data shortage problem in
non-English languages. The retrieval-augmented model leverages keyphrase
annotations in English datasets to facilitate generating keyphrases in
low-resource languages. Given a non-English passage, a cross-lingual dense
passage retrieval module finds relevant English passages. Then the associated
English keyphrases serve as external knowledge for keyphrase generation in the
current language. Moreover, we develop a retriever-generator iterative training
algorithm to mine pseudo parallel passage pairs to strengthen the cross-lingual
passage retriever. Comprehensive experiments and ablations show that the
proposed approach outperforms all baselines.
Related papers
- Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Multilingual Lexical Simplification via Paraphrase Generation [19.275642346073557]
We propose a novel multilingual LS method via paraphrase generation.
We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation.
Our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.
arXiv Detail & Related papers (2023-07-28T03:47:44Z) - ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual
Open-retrieval Question Answering System [16.89747171947662]
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA)
In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate an answer in the language of the question.
We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation.
arXiv Detail & Related papers (2022-05-30T10:31:08Z) - Cross-Lingual Phrase Retrieval [49.919180978902915]
Cross-lingual retrieval aims to retrieve relevant text across languages.
Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level.
We propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences.
arXiv Detail & Related papers (2022-04-19T13:35:50Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - Paraphrase Generation as Zero-Shot Multilingual Translation:
Disentangling Semantic Similarity from Lexical and Syntactic Diversity [11.564158965143418]
We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input.
Our approach enables paraphrase generation in many languages from a single multilingual NMT model.
arXiv Detail & Related papers (2020-08-11T18:05:34Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.