Improving Neural Cross-Lingual Summarization via Employing Optimal
Transport Distance for Knowledge Distillation
- URL: http://arxiv.org/abs/2112.03473v1
- Date: Tue, 7 Dec 2021 03:45:02 GMT
- Title: Improving Neural Cross-Lingual Summarization via Employing Optimal
Transport Distance for Knowledge Distillation
- Authors: Thong Nguyen, Luu Anh Tuan
- Abstract summary: Cross-lingual summarization models rely on the self-attention mechanism to attend among tokens in two languages.
We propose a novel Knowledge-Distillation-based framework for Cross-Lingual Summarization.
Our method outperforms state-of-the-art models under both high and low-resourced settings.
- Score: 8.718749742587857
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current state-of-the-art cross-lingual summarization models employ multi-task
learning paradigm, which works on a shared vocabulary module and relies on the
self-attention mechanism to attend among tokens in two languages. However,
correlation learned by self-attention is often loose and implicit, inefficient
in capturing crucial cross-lingual representations between languages. The
matter worsens when performing on languages with separate morphological or
structural features, making the cross-lingual alignment more challenging,
resulting in the performance drop. To overcome this problem, we propose a novel
Knowledge-Distillation-based framework for Cross-Lingual Summarization, seeking
to explicitly construct cross-lingual correlation by distilling the knowledge
of the monolingual summarization teacher into the cross-lingual summarization
student. Since the representations of the teacher and the student lie on two
different vector spaces, we further propose a Knowledge Distillation loss using
Sinkhorn Divergence, an Optimal-Transport distance, to estimate the discrepancy
between those teacher and student representations. Due to the intuitively
geometric nature of Sinkhorn Divergence, the student model can productively
learn to align its produced cross-lingual hidden states with monolingual hidden
states, hence leading to a strong correlation between distant languages.
Experiments on cross-lingual summarization datasets in pairs of distant
languages demonstrate that our method outperforms state-of-the-art models under
both high and low-resourced settings.
Related papers
- HC$^2$L: Hybrid and Cooperative Contrastive Learning for Cross-lingual Spoken Language Understanding [45.12153788010354]
State-of-the-art model for cross-lingual spoken language understanding performs cross-lingual unsupervised contrastive learning.
We propose Hybrid and Cooperative Contrastive Learning to address this problem.
arXiv Detail & Related papers (2024-05-10T02:40:49Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment [63.0407314271459]
The proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
arXiv Detail & Related papers (2022-10-09T02:24:35Z) - Learning Multilingual Representation for Natural Language Understanding
with Enhanced Cross-Lingual Supervision [42.724921817550516]
We propose a network named decomposed attention (DA) as a replacement of MA.
The DA consists of an intra-lingual attention (IA) and a cross-lingual attention (CA), which model intralingual and cross-lingual supervisions respectively.
Experiments on various cross-lingual natural language understanding tasks show that the proposed architecture and learning strategy significantly improve the model's cross-lingual transferability.
arXiv Detail & Related papers (2021-06-09T16:12:13Z) - Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations.
We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.