Cross-lingual Dependency Parsing as Domain Adaptation
- URL: http://arxiv.org/abs/2012.13163v1
- Date: Thu, 24 Dec 2020 08:14:36 GMT
- Title: Cross-lingual Dependency Parsing as Domain Adaptation
- Authors: Kailai Sun, Zuchao Li, Hai Zhao
- Abstract summary: Cross-lingual transfer learning is as essential as in-domain learning.
We use the ability of a pre-training task that extracts universal features without supervision.
We combine the traditional self-training and the two pre-training tasks.
- Score: 48.69930912510414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In natural language processing (NLP), cross-lingual transfer learning is as
essential as in-domain learning due to the unavailability of annotated
resources for low-resource languages. In this paper, we use the ability of a
pre-training task that extracts universal features without supervision. We add
two pre-training tasks as the auxiliary task into dependency parsing as
multi-tasking, which improves the performance of the model in both in-domain
and cross-lingual aspects. Moreover, inspired by the usefulness of
self-training in cross-domain learning, we combine the traditional
self-training and the two pre-training tasks. In this way, we can continuously
extract universal features not only in training corpus but also in extra
unannotated data and gain further improvement.
Related papers
- UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding [31.272603877215733]
Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages.
We propose an Unsupervised Pseudo Semantic Data Augmentation (UniPSDA) mechanism for cross-lingual natural language understanding to enrich the training data without human interventions.
arXiv Detail & Related papers (2024-06-24T07:27:01Z) - AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness [16.896143197472114]
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian languages.
We propose using machine translation for data augmentation to address the low-resource challenge of limited training data.
We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer)
arXiv Detail & Related papers (2024-04-01T21:21:15Z) - Subspace Chronicles: How Linguistic Information Emerges, Shifts and
Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds.
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z) - Effective Transfer Learning for Low-Resource Natural Language
Understanding [15.752309656576129]
We focus on developing cross-lingual and cross-domain methods to tackle the low-resource issues.
First, we propose to improve the model's cross-lingual ability by focusing on the task-related keywords.
Second, we present Order-Reduced Modeling methods for the cross-lingual adaptation.
Third, we propose to leverage different levels of domain-related corpora and additional masking of data in the pre-training for the cross-domain adaptation.
arXiv Detail & Related papers (2022-08-19T06:59:00Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Cross-Lingual Language Model Meta-Pretraining [21.591492094502424]
We propose a cross-lingual language model meta-pretraining, which learns the two abilities in different training phases.
Our method improves both generalization and cross-lingual transfer, and produces better-aligned representations across different languages.
arXiv Detail & Related papers (2021-09-23T03:47:44Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Mutlitask Learning for Cross-Lingual Transfer of Semantic Dependencies [21.503766432869437]
We develop broad-coverage semantic dependencys for languages with no semantically annotated resource.
We leverage a multitask learning framework coupled with an annotation projection method.
We show that our best multitask model improves the labeled F1 score over the single-task baseline by 1.8 in the in-domain SemEval data.
arXiv Detail & Related papers (2020-04-30T17:09:51Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.