Substructure Distribution Projection for Zero-Shot Cross-Lingual
Dependency Parsing
- URL: http://arxiv.org/abs/2110.08538v1
- Date: Sat, 16 Oct 2021 10:12:28 GMT
- Title: Substructure Distribution Projection for Zero-Shot Cross-Lingual
Dependency Parsing
- Authors: Haoyue Shi, Kevin Gimpel, Karen Livescu
- Abstract summary: SubDP is a technique that projects a distribution over structures in one domain to another, by projecting substructure distributions separately.
We evaluate SubDP on zero-shot cross-lingual dependency parsing, taking dependency arcs as substructures.
- Score: 55.69800855705232
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present substructure distribution projection (SubDP), a technique that
projects a distribution over structures in one domain to another, by projecting
substructure distributions separately. Models for the target domains can be
then trained, using the projected distributions as soft silver labels. We
evaluate SubDP on zero-shot cross-lingual dependency parsing, taking dependency
arcs as substructures: we project the predicted dependency arc distributions in
the source language(s) to target language(s), and train a target language
parser to fit the resulting distributions. When an English treebank is the only
annotation that involves human effort, SubDP achieves better unlabeled
attachment score than all prior work on the Universal Dependencies v2.2 (Nivre
et al., 2020) test set across eight diverse target languages, as well as the
best labeled attachment score on six out of eight languages. In addition, SubDP
improves zero-shot cross-lingual dependency parsing with very few (e.g., 50)
supervised bitext pairs, across a broader range of target languages.
Related papers
- Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing [6.074150063191985]
Cross-Lingual Back-Parsing is a novel data augmentation methodology designed to enhance cross-lingual transfer for semantic parsing.
Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings.
arXiv Detail & Related papers (2024-10-01T08:53:38Z) - Multilingual Word Embeddings for Low-Resource Languages using Anchors
and a Chain of Related Languages [54.832599498774464]
We propose to build multilingual word embeddings (MWEs) via a novel language chain-based approach.
We build MWEs one language at a time by starting from the resource rich source and sequentially adding each language in the chain till we reach the target.
We evaluate our method on bilingual lexicon induction for 4 language families, involving 4 very low-resource (5M tokens) and 4 moderately low-resource (50M) target languages.
arXiv Detail & Related papers (2023-11-21T09:59:29Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - DT-grams: Structured Dependency Grammar Stylometry for Cross-Language
Authorship Attribution [0.20305676256390934]
We present a novel language-independent feature for authorship analysis based on dependency graphs and universal part of speech tags, called DT-grams.
We evaluate DT-grams by performing cross-language authorship attribution on untranslated datasets of bilingual authors.
arXiv Detail & Related papers (2021-06-10T11:50:07Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - A Practical Chinese Dependency Parser Based on A Large-scale Dataset [21.359679124869402]
Dependency parsing is a longstanding natural language processing task, with its outputs crucial to various downstream tasks.
Recently, neural network based (NN-based) dependency has achieved significant progress and obtained the state-of-the-art results.
As we all know, NN-based approaches require massive amounts of labeled training data, which is very expensive because it requires human annotation by experts.
arXiv Detail & Related papers (2020-09-02T08:41:46Z) - Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z) - Towards Instance-Level Parser Selection for Cross-Lingual Transfer of
Dependency Parsers [59.345145623931636]
We argue for a novel cross-lingual transfer paradigm: instance-level selection (ILPS)
We present a proof-of-concept study focused on instance-level selection in the framework of delexicalized transfer.
arXiv Detail & Related papers (2020-04-16T13:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.