Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali
- URL: http://arxiv.org/abs/2412.13860v1
- Date: Wed, 18 Dec 2024 13:53:59 GMT
- Title: Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali
- Authors: Sharad Duwal, Suraj Prasai, Suresh Manandhar,
- Abstract summary: Domain-adaptive pre-training (DAPT) focuses on continually training a pre-trained language model to adapt it to a domain it was not originally trained on.<n>We use synthetic data to continue training Llama 3 8B to adapt it to the Nepali language in a 4-bit QLoRA setting.<n>We evaluate the adapted model on its performance, forgetting, and knowledge acquisition.
- Score: 0.20999222360659603
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Continual learning has emerged as an important research direction due to the infeasibility of retraining large language models (LLMs) from scratch in the event of new data availability. Of great interest is the domain-adaptive pre-training (DAPT) paradigm, which focuses on continually training a pre-trained language model to adapt it to a domain it was not originally trained on. In this work, we evaluate the feasibility of DAPT in a low-resource setting, namely the Nepali language. We use synthetic data to continue training Llama 3 8B to adapt it to the Nepali language in a 4-bit QLoRA setting. We evaluate the adapted model on its performance, forgetting, and knowledge acquisition. We compare the base model and the final model on their Nepali generation abilities, their performance on popular benchmarks, and run case-studies to probe their linguistic knowledge in Nepali. We see some unsurprising forgetting in the final model, but also surprisingly find that increasing the number of shots during evaluation yields better percent increases in the final model (as high as 19.29% increase) compared to the base model (4.98%), suggesting latent retention. We also explore layer-head self-attention heatmaps to establish dependency resolution abilities of the final model in Nepali.
Related papers
- Advancing Large Language Models for Tibetan with Curated Data and Continual Pre-Training [43.57169338795754]
We create the largest Tibetan pre-training corpus to date, aggregating data from diverse sources.<n>With curated data, we continue pre/post-training a multilingual base model to enhance its generative capabilities in Tibetan.<n>We create new high-quality Tibetan benchmarks, and complement them with existing public benchmarks.
arXiv Detail & Related papers (2025-07-12T08:54:05Z) - Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa [2.5055584842618175]
Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by identifying sentiments expressed in text.
This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa.
arXiv Detail & Related papers (2025-01-19T11:52:46Z) - Development of Pre-Trained Transformer-based Models for the Nepali Language [0.0]
The Nepali language, spoken by approximately 32 million people worldwide, remains significantly underrepresented in this domain.
We have collected 27.5 GB of Nepali text data, approximately 2.4x larger than any previously available Nepali language corpus.
Our models outperformed the existing best model by 2 points on Nep-gLUE benchmark, scoring 95.60 and also outperformed existing models on text generation tasks.
arXiv Detail & Related papers (2024-11-24T06:38:24Z) - Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models [51.20499954955646]
Large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora during the pretraining phase.
In later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training.
We propose a two-stage fine-tuning strategy to improve the model's overall test accuracy and knowledge retention.
arXiv Detail & Related papers (2024-10-08T08:35:16Z) - Abstractive Summarization of Low resourced Nepali language using Multilingual Transformers [0.0]
The research addresses key challenges associated with summarizing texts in Nepali by first creating a summarization dataset through web scraping.
The performance of the fine-tuned models were then assessed using ROUGE scores and human evaluation.
The 4-bit quantized mBART with LoRA model was found to be effective in generating better Nepali news headlines.
arXiv Detail & Related papers (2024-09-29T05:58:27Z) - FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models [5.748694060126043]
We evaluate four different types of discriminatory practices within visual-language models.
We introduce FairPIthera, a method to reduce them by removing the most affected dimensions of feature embeddings.
The application of FairPIthera has led to a significant reduction of up to 98% in observed biases.
arXiv Detail & Related papers (2024-09-28T22:49:22Z) - Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress.
LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset.
Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z) - Towards Better Instruction Following Language Models for Chinese:
Investigating the Impact of Training Data and Evaluation [12.86275938443485]
We examine the influence of training data factors, including quantity, quality, and linguistic distribution, on model performance.
We assess various models using a evaluation set of 1,000 samples, encompassing nine real-world scenarios.
We extend the vocabulary of LLaMA - the model with the closest open-source performance to proprietary language models like GPT-3.
arXiv Detail & Related papers (2023-04-16T18:37:39Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - A Kernel-Based View of Language Model Fine-Tuning [94.75146965041131]
We investigate whether the Neural Tangent Kernel (NTK) describes fine-tuning of pre-trained LMs.
We show that formulating the downstream task as a masked word prediction problem through prompting often induces kernel-based dynamics during fine-tuning.
arXiv Detail & Related papers (2022-10-11T17:34:32Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas.
We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches.
We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.