ClueGraphSum: Let Key Clues Guide the Cross-Lingual Abstractive
Summarization
- URL: http://arxiv.org/abs/2203.02797v2
- Date: Wed, 9 Mar 2022 08:01:15 GMT
- Title: ClueGraphSum: Let Key Clues Guide the Cross-Lingual Abstractive
Summarization
- Authors: Shuyu Jiang, Dengbiao Tu, Xingshu Chen, Rui Tang, Wenxian Wang,
Haizhou Wang
- Abstract summary: Cross-Lingual Summarization is the task to generate a summary in one language for an article in a different language.
Previous studies on CLS mainly take pipeline methods or train the end-to-end model using translated parallel data.
We propose a clue-guided cross-lingual abstractive summarization method to improve the quality of cross-lingual summaries.
- Score: 5.873920727236548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-Lingual Summarization (CLS) is the task to generate a summary in one
language for an article in a different language. Previous studies on CLS mainly
take pipeline methods or train the end-to-end model using the translated
parallel data. However, the quality of generated cross-lingual summaries needs
more further efforts to improve, and the model performance has never been
evaluated on the hand-written CLS dataset. Therefore, we first propose a
clue-guided cross-lingual abstractive summarization method to improve the
quality of cross-lingual summaries, and then construct a novel hand-written CLS
dataset for evaluation. Specifically, we extract keywords, named entities, etc.
of the input article as key clues for summarization and then design a
clue-guided algorithm to transform an article into a graph with less noisy
sentences. One Graph encoder is built to learn sentence semantics and article
structures and one Clue encoder is built to encode and translate key clues,
ensuring the information of important parts are reserved in the generated
summary. These two encoders are connected by one decoder to directly learn
cross-lingual semantics. Experimental results show that our method has stronger
robustness for longer inputs and substantially improves the performance over
the strong baseline, achieving an improvement of 8.55 ROUGE-1
(English-to-Chinese summarization) and 2.13 MoverScore (Chinese-to-English
summarization) scores over the existing SOTA.
Related papers
- ConVerSum: A Contrastive Learning based Approach for Data-Scarce Solution of Cross-Lingual Summarization Beyond Direct Equivalents [3.356903304289716]
Cross-Lingual summarization is a sophisticated branch in Natural Language Processing.
There is no feasible solution for CLS when there is no available high-quality CLS data.
We propose a novel data-efficient approach, ConVerSum, for CLS leveraging the power of contrastive learning.
arXiv Detail & Related papers (2024-08-17T19:03:53Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - $\mu$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge [72.64847925450368]
Cross-lingual summarization consists of generating a summary in one language given an input document in a different language.
This work presents $mu$PLAN, an approach to cross-lingual summarization that uses an intermediate planning step as a cross-lingual bridge.
arXiv Detail & Related papers (2023-05-23T16:25:21Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive
Summarization [41.578594261746055]
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems.
We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors.
We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article.
arXiv Detail & Related papers (2020-10-07T00:28:05Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.