Related papers: BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Extreme Low-Resource Languages

BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Extreme Low-Resource Languages

URL: http://arxiv.org/abs/2602.05599v1
Date: Thu, 05 Feb 2026 12:33:30 GMT
Title: BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Extreme Low-Resource Languages
Authors: Subhadip Maji, Arnab Bhattacharya,
Abstract summary: Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge.<n>We introduce a novel method for cross-lingual knowledge transfer along with two adopted baselines.<n> Experimental results demonstrate that our GNN-based approach significantly outperforms existing multilingual and cross-lingual baseline methods.
Score: 7.883895869179052
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite remarkable advances in natural language processing, developing effective systems for low-resource languages remains a formidable challenge, with performances typically lagging far behind high-resource counterparts due to data scarcity and insufficient linguistic resources. Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge by leveraging resources from high-resource languages. In this paper, we investigate methods for transferring linguistic knowledge from high-resource languages to low-resource languages, where the number of labeled training instances is in hundreds. We focus on sentence-level and word-level tasks. We introduce a novel method, GETR (Graph-Enhanced Token Representation) for cross-lingual knowledge transfer along with two adopted baselines (a) augmentation in hidden layers and (b) token embedding transfer through token translation. Experimental results demonstrate that our GNN-based approach significantly outperforms existing multilingual and cross-lingual baseline methods, achieving 13 percentage point improvements on truly low-resource languages (Mizo, Khasi) for POS tagging, and 20 and 27 percentage point improvements in macro-F1 on simulated low-resource languages (Marathi, Bangla, Malayalam) across sentiment classification and NER tasks respectively. We also present a detailed analysis of the transfer mechanisms and identify key factors that contribute to successful knowledge transfer in this linguistic context.

Related papers

Bridging Language Gaps: Enhancing Few-Shot Language Adaptation [32.157041759856]
The disparity in language resources poses a challenge in multilingual NLP.<n>High-resource languages benefit from extensive data, while low-resource languages lack sufficient data for effective training.<n>Our Contrastive Language Alignment with Prompting (CoLAP) method addresses this gap by integrating contrastive learning with cross-lingual representations.
arXiv Detail & Related papers (2025-08-26T22:49:17Z)
Cross-Lingual Transfer for Low-Resource Natural Language Processing [0.32634122554914]
Cross-lingual transfer learning is a research area aimed at leveraging data and models from high-resource languages to improve NLP performance.<n>This thesis presents a new method to improve data-based transfer with T-Projection, a state-of-the-art annotation projection method.<n>For model-based transfer, we introduce a constrained decoding algorithm that enhances cross-lingual Sequence Labeling in zero-shot settings.<n>Finally, we develop Medical mT5, the first multilingual text-to-text medical model.
arXiv Detail & Related papers (2025-02-04T21:17:46Z)
Revisiting Projection-based Data Transfer for Cross-Lingual Named Entity Recognition in Low-Resource Languages [8.612181075294327]
We show that the data-based cross-lingual transfer method is an effective technique for crosslingual NER.<n>We present a novel formalized projection approach of matching source entities with extracted target candidates.<n>These findings highlight the robustness of projection-based data transfer as an alternative to model-based methods for crosslingual named entity recognition in lowresource languages.
arXiv Detail & Related papers (2025-01-30T21:00:47Z)
Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP. We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z)
Cross-Lingual Transfer Robustness to Lower-Resource Languages on Adversarial Datasets [4.653113033432781]
Cross-lingual transfer capabilities of Multilingual Language Models (MLLMs) are investigated. Our research provides valuable insights into cross-lingual transfer and its implications for NLP applications.
arXiv Detail & Related papers (2024-03-29T08:47:15Z)
High-resource Language-specific Training for Multilingual Neural Machine Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference. Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder. HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z)
Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity [81.51206991542242]
Cross-lingual transfer offers a compelling way to help bridge this digital divide. Current cross-lingual algorithms have shown success in text-based tasks and speech-related tasks over some low-resource languages. We propose a language similarity approach that can efficiently identify acoustic cross-lingual transfer pairs across hundreds of languages.
arXiv Detail & Related papers (2021-11-02T01:55:17Z)
MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning [91.5426763812547]
Cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages. We propose MetaXL, a meta-learning based framework that learns to transform representations judiciously from auxiliary languages to a target one.
arXiv Detail & Related papers (2021-04-16T06:15:52Z)
Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision. A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs. A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z)
Cross-lingual, Character-Level Neural Morphological Tagging [57.0020906265213]
We train character-level recurrent neural taggers to predict morphological taggings for high-resource languages and low-resource languages together.<n>Learning joint character representations among multiple related languages successfully enables knowledge transfer from the high-resource languages to the low-resource ones, improving accuracy by up to 30% over a monolingual model.
arXiv Detail & Related papers (2017-08-30T08:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.