Filtered Inner Product Projection for Crosslingual Embedding Alignment
- URL: http://arxiv.org/abs/2006.03652v2
- Date: Tue, 23 Mar 2021 22:00:24 GMT
- Title: Filtered Inner Product Projection for Crosslingual Embedding Alignment
- Authors: Vin Sachidananda, Ziyi Yang, Chenguang Zhu
- Abstract summary: Filtered Inner Product Projection (FIPP) is a method for mapping embeddings to a common representation space.
FIPP is applicable even when the source and target embeddings are of differing dimensionalities.
We show that our approach outperforms existing methods on the MUSE dataset for various language pairs.
- Score: 28.72288652451881
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Due to widespread interest in machine translation and transfer learning,
there are numerous algorithms for mapping multiple embeddings to a shared
representation space. Recently, these algorithms have been studied in the
setting of bilingual dictionary induction where one seeks to align the
embeddings of a source and a target language such that translated word pairs
lie close to one another in a common representation space. In this paper, we
propose a method, Filtered Inner Product Projection (FIPP), for mapping
embeddings to a common representation space and evaluate FIPP in the context of
bilingual dictionary induction. As semantic shifts are pervasive across
languages and domains, FIPP first identifies the common geometric structure in
both embeddings and then, only on the common structure, aligns the Gram
matrices of these embeddings. Unlike previous approaches, FIPP is applicable
even when the source and target embeddings are of differing dimensionalities.
We show that our approach outperforms existing methods on the MUSE dataset for
various language pairs. Furthermore, FIPP provides computational benefits both
in ease of implementation and scalability.
Related papers
- Domain Embeddings for Generating Complex Descriptions of Concepts in
Italian Language [65.268245109828]
We propose a Distributional Semantic resource enriched with linguistic and lexical information extracted from electronic dictionaries.
The resource comprises 21 domain-specific matrices, one comprehensive matrix, and a Graphical User Interface.
Our model facilitates the generation of reasoned semantic descriptions of concepts by selecting matrices directly associated with concrete conceptual knowledge.
arXiv Detail & Related papers (2024-02-26T15:04:35Z) - Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow
Interpolation [48.32604585839687]
Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data.
We propose to make use of a sequence of intermediate spaces for smooth bridging.
arXiv Detail & Related papers (2022-10-07T04:37:47Z) - Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and
Isometric Conditions [7.615096161060399]
We investigate a context-aware and dictionary-free mapping approach by leveraging parallel corpora.
Our findings unfold the tight relationship between isotropy, isometry, and isomorphism in normalized contextual embedding spaces.
arXiv Detail & Related papers (2021-07-19T22:57:36Z) - Multi-view Subword Regularization [111.04350390045705]
Multi-view Subword Regularization (MVR) is a method that enforces the consistency between predictions of using inputs tokenized by the standard and probabilistic segmentations.
Results on the XTREME multilingual benchmark show that MVR brings consistent improvements of up to 2.5 points over using standard segmentation algorithms.
arXiv Detail & Related papers (2021-03-15T16:07:42Z) - Multi-Adversarial Learning for Cross-Lingual Word Embeddings [19.407717032782863]
We propose a novel method for inducing cross-lingual word embeddings.
It induces the seed cross-lingual dictionary through multiple mappings, each induced to fit the mapping for one subspace.
Our experiments on unsupervised bilingual lexicon induction show that this method improves performance over previous single-mapping methods.
arXiv Detail & Related papers (2020-10-16T14:54:28Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations.
We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z) - Data Augmentation with Unsupervised Machine Translation Improves the
Structural Similarity of Cross-lingual Word Embeddings [29.467158098595924]
Cross-lingual word embedding methods learn a linear transformation matrix that maps two monolingual embedding spaces.
We argue that using a pseudo-parallel corpus generated by an unsupervised machine translation model facilitates the structural similarity of the two embedding spaces.
arXiv Detail & Related papers (2020-05-30T13:28:03Z) - Geometry-aware Domain Adaptation for Unsupervised Alignment of Word
Embeddings [15.963615360741356]
We propose a novel manifold based geometric approach for learning unsupervised alignment of word embedding between the source and the target languages.
Our approach formulates the alignment learning problem as a domain adaptation problem over the manifolds bilingual doubly matrices.
Empirically, the proposed approach outperforms state-of-the-art optimal transport based approach on the gradient induction task across several language pairs.
arXiv Detail & Related papers (2020-04-06T04:41:06Z) - Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages.
We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.