BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Extreme Low-Resource Languages
- URL: http://arxiv.org/abs/2602.05599v1
- Date: Thu, 05 Feb 2026 12:33:30 GMT
- Title: BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Extreme Low-Resource Languages
- Authors: Subhadip Maji, Arnab Bhattacharya,
- Abstract summary: Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge.<n>We introduce a novel method for cross-lingual knowledge transfer along with two adopted baselines.<n> Experimental results demonstrate that our GNN-based approach significantly outperforms existing multilingual and cross-lingual baseline methods.
- Score: 7.883895869179052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite remarkable advances in natural language processing, developing effective systems for low-resource languages remains a formidable challenge, with performances typically lagging far behind high-resource counterparts due to data scarcity and insufficient linguistic resources. Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge by leveraging resources from high-resource languages. In this paper, we investigate methods for transferring linguistic knowledge from high-resource languages to low-resource languages, where the number of labeled training instances is in hundreds. We focus on sentence-level and word-level tasks. We introduce a novel method, GETR (Graph-Enhanced Token Representation) for cross-lingual knowledge transfer along with two adopted baselines (a) augmentation in hidden layers and (b) token embedding transfer through token translation. Experimental results demonstrate that our GNN-based approach significantly outperforms existing multilingual and cross-lingual baseline methods, achieving 13 percentage point improvements on truly low-resource languages (Mizo, Khasi) for POS tagging, and 20 and 27 percentage point improvements in macro-F1 on simulated low-resource languages (Marathi, Bangla, Malayalam) across sentiment classification and NER tasks respectively. We also present a detailed analysis of the transfer mechanisms and identify key factors that contribute to successful knowledge transfer in this linguistic context.
Related papers
- Bridging Language Gaps: Enhancing Few-Shot Language Adaptation [32.157041759856]
The disparity in language resources poses a challenge in multilingual NLP.<n>High-resource languages benefit from extensive data, while low-resource languages lack sufficient data for effective training.<n>Our Contrastive Language Alignment with Prompting (CoLAP) method addresses this gap by integrating contrastive learning with cross-lingual representations.
arXiv Detail & Related papers (2025-08-26T22:49:17Z) - Cross-Lingual Transfer for Low-Resource Natural Language Processing [0.32634122554914]
Cross-lingual transfer learning is a research area aimed at leveraging data and models from high-resource languages to improve NLP performance.<n>This thesis presents a new method to improve data-based transfer with T-Projection, a state-of-the-art annotation projection method.<n>For model-based transfer, we introduce a constrained decoding algorithm that enhances cross-lingual Sequence Labeling in zero-shot settings.<n>Finally, we develop Medical mT5, the first multilingual text-to-text medical model.
arXiv Detail & Related papers (2025-02-04T21:17:46Z) - Revisiting Projection-based Data Transfer for Cross-Lingual Named Entity Recognition in Low-Resource Languages [8.612181075294327]
We show that the data-based cross-lingual transfer method is an effective technique for crosslingual NER.<n>We present a novel formalized projection approach of matching source entities with extracted target candidates.<n>These findings highlight the robustness of projection-based data transfer as an alternative to model-based methods for crosslingual named entity recognition in lowresource languages.
arXiv Detail & Related papers (2025-01-30T21:00:47Z) - Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP.
We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z) - Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets [4.653113033432781]
Cross-lingual transfer capabilities of Multilingual Language Models (MLLMs) are investigated.
Our research provides valuable insights into cross-lingual transfer and its implications for NLP applications.
arXiv Detail & Related papers (2024-03-29T08:47:15Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Cross-lingual Transfer for Speech Processing using Acoustic Language
Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide.
Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages.
We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z) - MetaXL: Meta Representation Transformation for Low-resource
Cross-lingual Learning [91.5426763812547]
Cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages.
We propose MetaXL, a meta-learning based framework that learns to transform representations judiciously from auxiliary languages to a target one.
arXiv Detail & Related papers (2021-04-16T06:15:52Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z) - Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.<n>Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.