SimCSum: Joint Learning of Simplification and Cross-lingual
Summarization for Cross-lingual Science Journalism
- URL: http://arxiv.org/abs/2304.01621v1
- Date: Tue, 4 Apr 2023 08:24:22 GMT
- Title: SimCSum: Joint Learning of Simplification and Cross-lingual
Summarization for Cross-lingual Science Journalism
- Authors: Mehwish Fatima, Tim Kolber, Katja Markert and Michael Strube
- Abstract summary: Cross-lingual science journalism generates popular science stories of scientific articles different from the source language for a non-expert audience.
We improve cross-lingual summary generation by joint training of two high-level NLP tasks, simplification and cross-lingual summarization.
SimCSum demonstrates statistically significant improvements over the state-of-the-art on two non-synthetic cross-lingual scientific datasets.
- Score: 8.187718963808484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-lingual science journalism generates popular science stories of
scientific articles different from the source language for a non-expert
audience. Hence, a cross-lingual popular summary must contain the salient
content of the input document, and the content should be coherent,
comprehensible, and in a local language for the targeted audience. We improve
these aspects of cross-lingual summary generation by joint training of two
high-level NLP tasks, simplification and cross-lingual summarization. The
former task reduces linguistic complexity, and the latter focuses on
cross-lingual abstractive summarization. We propose a novel multi-task
architecture - SimCSum consisting of one shared encoder and two parallel
decoders jointly learning simplification and cross-lingual summarization. We
empirically investigate the performance of SimCSum by comparing it with several
strong baselines over several evaluation metrics and by human evaluation.
Overall, SimCSum demonstrates statistically significant improvements over the
state-of-the-art on two non-synthetic cross-lingual scientific datasets.
Furthermore, we conduct an in-depth investigation into the linguistic
properties of generated summaries and an error analysis.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - EMMA-X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual
Representation Learning [74.60554112841307]
We propose EMMAX: an EM-like Multilingual pretraining algorithm to learn (X)Crosslingual universals.
EMMAX unifies cross-lingual representation learning task and an extra semantic relation prediction task within an EM framework.
arXiv Detail & Related papers (2023-10-26T08:31:00Z) - $\mu$PLAN: Summarizing using a Content Plan as Cross-Lingual Bridge [72.64847925450368]
Cross-lingual summarization consists of generating a summary in one language given an input document in a different language.
This work presents $mu$PLAN, an approach to cross-lingual summarization that uses an intermediate planning step as a cross-lingual bridge.
arXiv Detail & Related papers (2023-05-23T16:25:21Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents [12.493662336994106]
We present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain.
We train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese.
arXiv Detail & Related papers (2022-05-30T12:31:28Z) - Improving Neural Cross-Lingual Summarization via Employing Optimal
Transport Distance for Knowledge Distillation [8.718749742587857]
Cross-lingual summarization models rely on the self-attention mechanism to attend among tokens in two languages.
We propose a novel Knowledge-Distillation-based framework for Cross-Lingual Summarization.
Our method outperforms state-of-the-art models under both high and low-resourced settings.
arXiv Detail & Related papers (2021-12-07T03:45:02Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - On Learning Universal Representations Across Languages [37.555675157198145]
We extend existing approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation.
Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to learn universal representations for parallel sentences distributed in one or multiple languages.
We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation.
arXiv Detail & Related papers (2020-07-31T10:58:39Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z) - Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
Lexical Semantic Similarity [67.36239720463657]
Multi-SimLex is a large-scale lexical resource and evaluation benchmark covering datasets for 12 diverse languages.
Each language dataset is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs.
Owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity datasets.
arXiv Detail & Related papers (2020-03-10T17:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.