$\mu$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge
- URL: http://arxiv.org/abs/2305.14205v2
- Date: Wed, 31 Jan 2024 13:28:58 GMT
- Title: $\mu$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge
- Authors: Fantine Huot, Joshua Maynez, Chris Alberti, Reinald Kim Amplayo,
Priyanka Agrawal, Constanza Fierro, Shashi Narayan, Mirella Lapata
- Abstract summary: Cross-lingual summarization consists of generating a summary in one language given an input document in a different language.
This work presents $mu$PLAN, an approach to cross-lingual summarization that uses an intermediate planning step as a cross-lingual bridge.
- Score: 72.64847925450368
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-lingual summarization consists of generating a summary in one language
given an input document in a different language, allowing for the dissemination
of relevant content across speakers of other languages. The task is challenging
mainly due to the paucity of cross-lingual datasets and the compounded
difficulty of summarizing and translating. This work presents $\mu$PLAN, an
approach to cross-lingual summarization that uses an intermediate planning step
as a cross-lingual bridge. We formulate the plan as a sequence of entities
capturing the summary's content and the order in which it should be
communicated. Importantly, our plans abstract from surface form: using a
multilingual knowledge base, we align entities to their canonical designation
across languages and generate the summary conditioned on this cross-lingual
bridge and the input. Automatic and human evaluation on the XWikis dataset
(across four language pairs) demonstrates that our planning objective achieves
state-of-the-art performance in terms of informativeness and faithfulness.
Moreover, $\mu$PLAN models improve the zero-shot transfer to new cross-lingual
language pairs compared to baselines without a planning component.
Related papers
- Automatic Data Retrieval for Cross Lingual Summarization [4.759360739268894]
Cross-lingual summarization involves the summarization of text written in one language to a different one.
In this work, we aim to perform cross-lingual summarization from English to Hindi.
arXiv Detail & Related papers (2023-12-22T09:13:24Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive
Summarization [41.578594261746055]
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems.
We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors.
We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article.
arXiv Detail & Related papers (2020-10-07T00:28:05Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.