Related papers: Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation

Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation

URL: http://arxiv.org/abs/2103.02212v1
Date: Wed, 3 Mar 2021 06:50:43 GMT
Title: Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation
Authors: Haoran Xu and Philipp Koehn
Abstract summary: Cross-lingual embedding space mapping is usually studied in static word-level embeddings. We investigate a contextual embedding alignment approach which is sense-level and dictionary-free. Experiments on zero-shot dependency parsing through the concept-shared space built by our embedding transformation substantially outperform state-of-the-art methods using multilingual embeddings.
Score: 7.615096161060399
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Linear embedding transformation has been shown to be effective for zero-shot cross-lingual transfer tasks and achieve surprisingly promising results. However, cross-lingual embedding space mapping is usually studied in static word-level embeddings, where a space transformation is derived by aligning representations of translation pairs that are referred from dictionaries. We move further from this line and investigate a contextual embedding alignment approach which is sense-level and dictionary-free. To enhance the quality of the mapping, we also provide a deep view of properties of contextual embeddings, i.e., anisotropy problem and its solution. Experiments on zero-shot dependency parsing through the concept-shared space built by our embedding transformation substantially outperform state-of-the-art methods using multilingual embeddings.

Related papers

Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion. It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing. Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z)
Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation [48.32604585839687]
Previous adversarial approaches have shown promising results in inducing cross-lingual word embedding without parallel data. We propose to make use of a sequence of intermediate spaces for smooth bridging.
arXiv Detail & Related papers (2022-10-07T04:37:47Z)
Smooth image-to-image translations with latent space interpolations [64.8170758294427]
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain. We show that our regularization techniques can improve the state-of-the-art I2I translations by a large margin.
arXiv Detail & Related papers (2022-10-03T11:57:30Z)
Multilingual Extraction and Categorization of Lexical Collocations with Graph-aware Transformers [86.64972552583941]
We put forward a sequence tagging BERT-based model enhanced with a graph-aware transformer architecture, which we evaluate on the task of collocation recognition in context. Our results suggest that explicitly encoding syntactic dependencies in the model architecture is helpful, and provide insights on differences in collocation typification in English, Spanish and French.
arXiv Detail & Related papers (2022-05-23T16:47:37Z)
Cross-Lingual BERT Contextual Embedding Space Mapping with Isotropic and Isometric Conditions [7.615096161060399]
We investigate a context-aware and dictionary-free mapping approach by leveraging parallel corpora. Our findings unfold the tight relationship between isotropy, isometry, and isomorphism in normalized contextual embedding spaces.
arXiv Detail & Related papers (2021-07-19T22:57:36Z)
Unsupervised Word Translation Pairing using Refinement based Point Set Registration [8.568050813210823]
Cross-lingual alignment of word embeddings play an important role in knowledge transfer across languages. Current unsupervised approaches rely on similarities in geometric structure of word embedding spaces across languages. This paper proposes BioSpere, a novel framework for unsupervised mapping of bi-lingual word embeddings onto a shared vector space.
arXiv Detail & Related papers (2020-11-26T09:51:29Z)
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations. To this end, we automatically generate groups of sentences which are structurally similar but semantically different. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z)
A Comparative Study on Structural and Semantic Properties of Sentence Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction. We show that different embedding spaces have different degrees of strength for the structural and semantic properties. These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
Refinement of Unsupervised Cross-Lingual Word Embeddings [2.4366811507669124]
Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages. We propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings.
arXiv Detail & Related papers (2020-02-21T10:39:53Z)
A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Existing word vectors are projected to a common semantic space using linear transformations and averaging. The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations. Our approach significantly improves crosslingual sentence retrieval performance over all other approaches. It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.