ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional
GANs
- URL: http://arxiv.org/abs/2001.11121v1
- Date: Wed, 29 Jan 2020 22:44:05 GMT
- Title: ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional
GANs
- Authors: Zuohui Fu, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin
Dong, Guang Wang and Gerard de Melo
- Abstract summary: We propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data.
- Score: 48.287610663358066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A number of cross-lingual transfer learning approaches based on neural
networks have been proposed for the case when large amounts of parallel text
are at our disposal. However, in many real-world settings, the size of parallel
annotated training data is restricted. Additionally, prior cross-lingual
mapping research has mainly focused on the word level. This raises the question
of whether such techniques can also be applied to effortlessly obtain
cross-lingually aligned sentence representations. To this end, we propose an
Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which
learns mappings of cross-lingual sentence representations from limited
quantities of parallel data.
Related papers
- Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Cross-language Sentence Selection via Data Augmentation and Rationale
Training [22.106577427237635]
It uses data augmentation and negative sampling techniques on noisy parallel sentence data to learn a cross-lingual embedding-based query relevance model.
Results show that this approach performs as well as or better than multiple state-of-the-art machine translation + monolingual retrieval systems trained on the same parallel data.
arXiv Detail & Related papers (2021-06-04T07:08:47Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages.
We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.