Continual-learning for Modelling Low-Resource Languages from Large Language Models
- URL: http://arxiv.org/abs/2601.05874v1
- Date: Fri, 09 Jan 2026 15:51:12 GMT
- Title: Continual-learning for Modelling Low-Resource Languages from Large Language Models
- Authors: Santosh Srinath K, Mudit Somani, Varun Reddy Padala, Prajna Devi Upadhyay, Abhijit Das,
- Abstract summary: Small language models (SLM) built for low-resource languages pose the challenge of catastrophic forgetting.<n>This work proposes to employ a continual learning strategy using parts-of-speech (POS)-based code-switching.<n> Experiments conducted on vision language tasks such as visual question answering and language modelling task exhibits the success of the proposed architecture.
- Score: 1.462912591880424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modelling a language model for a multi-lingual scenario includes several potential challenges, among which catastrophic forgetting is the major challenge. For example, small language models (SLM) built for low-resource languages by adapting large language models (LLMs) pose the challenge of catastrophic forgetting. This work proposes to employ a continual learning strategy using parts-of-speech (POS)-based code-switching along with a replay adapter strategy to mitigate the identified gap of catastrophic forgetting while training SLM from LLM. Experiments conducted on vision language tasks such as visual question answering and language modelling task exhibits the success of the proposed architecture.
Related papers
- Improving Multilingual Math Reasoning for African Languages [49.27985213689457]
We conduct experiments to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations.<n>Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.
arXiv Detail & Related papers (2025-05-26T11:35:01Z) - LLMic: Romanian Foundation Language Model [76.09455151754062]
We present LLMic, a foundation language model designed specifically for the Romanian Language.<n>We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks.
arXiv Detail & Related papers (2025-01-13T22:14:45Z) - SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition [55.2480439325792]
Speech Meta In-Context LEarning (SMILE) is an innovative framework that combines meta-learning with speech in-context learning (SICL)<n>We show that SMILE consistently outperforms baseline methods in training-free few-shot multilingual ASR tasks.
arXiv Detail & Related papers (2024-09-16T16:04:16Z) - Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - ColBERT-XM: A Modular Multi-Vector Representation Model for Zero-Shot
Multilingual Information Retrieval [10.664434993386523]
Current approaches circumvent the lack of high-quality labeled data in non-English languages.
We present a novel modular dense retrieval model that learns from the rich data of a single high-resource language.
arXiv Detail & Related papers (2024-02-23T02:21:24Z) - Generalizing Multimodal Pre-training into Multilingual via Language
Acquisition [54.69707237195554]
English-based Vision-Language Pre-training has achieved great success in various downstream tasks.
Some efforts have been taken to generalize this success to non-English languages through Multilingual Vision-Language Pre-training.
We propose a textbfMultitextbfLingual textbfAcquisition (MLA) framework that can easily generalize a monolingual Vision-Language Pre-training model into multilingual.
arXiv Detail & Related papers (2022-05-29T08:53:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.