Implicit Word Reordering with Knowledge Distillation for Cross-Lingual Dependency Parsing
- URL: http://arxiv.org/abs/2502.17308v2
- Date: Fri, 14 Mar 2025 14:32:01 GMT
- Title: Implicit Word Reordering with Knowledge Distillation for Cross-Lingual Dependency Parsing
- Authors: Zhuoran Li, Chunming Hu, Junfan Chen, Zhijun Chen, Richong Zhang,
- Abstract summary: We propose an Implicit Word Reordering framework with Knowledge Distillation (IWR-KD)<n>This framework is inspired by that deep networks are good at learning feature linearization corresponding to meaningful data transformation.<n>We verify our proposed method on Universal Dependency Treebanks across 31 different languages and show it outperforms a series of competitors.
- Score: 35.40851210603478
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word order difference between source and target languages is a major obstacle to cross-lingual transfer, especially in the dependency parsing task. Current works are mostly based on order-agnostic models or word reordering to mitigate this problem. However, such methods either do not leverage grammatical information naturally contained in word order or are computationally expensive as the permutation space grows exponentially with the sentence length. Moreover, the reordered source sentence with an unnatural word order may be a form of noising that harms the model learning. To this end, we propose an Implicit Word Reordering framework with Knowledge Distillation (IWR-KD). This framework is inspired by that deep networks are good at learning feature linearization corresponding to meaningful data transformation, e.g. word reordering. To realize this idea, we introduce a knowledge distillation framework composed of a word-reordering teacher model and a dependency parsing student model. We verify our proposed method on Universal Dependency Treebanks across 31 different languages and show it outperforms a series of competitors, together with experimental analysis to illustrate how our method works towards training a robust parser.
Related papers
- CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages [10.441585970299547]
We propose a contrastive self-supervised learning method to make the model robust to word order variations.
Our proposed modification demonstrates a substantial average gain of 3.03/2.95 points in 7 relatively free word order languages.
arXiv Detail & Related papers (2024-10-09T14:38:49Z) - Improving Cross-Lingual Transfer through Subtree-Aware Word Reordering [17.166996956587155]
One obstacle for effective cross-lingual transfer is variability in word-order patterns.
We present a new powerful reordering method, defined in terms of Universal Dependencies.
We show that our method consistently outperforms strong baselines over different language pairs and model architectures.
arXiv Detail & Related papers (2023-10-20T15:25:53Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric
Code Switching [15.884119564193924]
Code-Switching offers language alignment at word- or phrase-level.
Existing approaches either use dictionaries or parallel sentences with word-alignment to generate CS data.
We propose EntityCS to capture fine-grained cross-lingual semantics without corrupting syntax.
arXiv Detail & Related papers (2022-10-22T20:05:40Z) - Discovering Non-monotonic Autoregressive Orderings with Variational
Inference [67.27561153666211]
We develop an unsupervised parallelizable learner that discovers high-quality generation orders purely from training data.
We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass.
Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.
arXiv Detail & Related papers (2021-10-27T16:08:09Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.