TigerLLM -- A Family of Bangla Large Language Models
- URL: http://arxiv.org/abs/2503.10995v1
- Date: Fri, 14 Mar 2025 01:41:16 GMT
- Title: TigerLLM -- A Family of Bangla Large Language Models
- Authors: Nishat Raihan, Marcos Zampieri,
- Abstract summary: We introduce TigerLLM - a family of Bangla language models.<n>Our results demonstrate that these models surpass all open-source alternatives and also outperform larger proprietary models like GPT3.5.
- Score: 8.258559455995917
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The development of Large Language Models (LLMs) remains heavily skewed towards English and a few other high-resource languages. This linguistic disparity is particularly evident for Bangla - the 5th most spoken language. A few initiatives attempted to create open-source Bangla LLMs with performance still behind high-resource languages and limited reproducibility. To address this gap, we introduce TigerLLM - a family of Bangla LLMs. Our results demonstrate that these models surpass all open-source alternatives and also outperform larger proprietary models like GPT3.5 across standard benchmarks, establishing TigerLLM as the new baseline for future Bangla language modeling.
Related papers
- LLMic: Romanian Foundation Language Model [76.09455151754062]
We present LLMic, a foundation language model designed specifically for the Romanian Language.
We show that fine-tuning LLMic for language translation after the initial pretraining phase outperforms existing solutions in English-to-Romanian translation tasks.
arXiv Detail & Related papers (2025-01-13T22:14:45Z) - MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish [17.36441080071885]
This report presents MERaLiON-TextLLM, a series of open-source language models specifically tailored to improve understanding and generation in Chinese, Indonesian, Malay, and Singlish.<n>Our approach achieves performance improvements across benchmarks in these languages, exceeding the capabilities of the official Llama-3 models.
arXiv Detail & Related papers (2024-12-21T05:50:48Z) - BongLLaMA: LLaMA for Bangla Language [0.0]
BongLLaMA is an open-source large language model fine-tuned exclusively on large Bangla corpora and instruction-tuning datasets.
We present our methodology, data augmentation techniques, fine-tuning details, and comprehensive benchmarking results showcasing the utility of BongLLaMA on BLP tasks.
arXiv Detail & Related papers (2024-10-28T16:44:02Z) - Performance of Recent Large Language Models for a Low-Resourced Language [0.0]
Large Language Models (LLMs) have shown significant advances in the past year.
Claude and GPT 4o perform well out-of-the-box and do significantly better than previous versions.
Llama and Mistral perform poorly but show some promise of improvement with fine tuning.
arXiv Detail & Related papers (2024-07-31T04:38:07Z) - TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese [0.0]
Large language models (LLMs) have significantly advanced natural language processing, but their progress has yet to be equal across languages.
In this study, we document the development of open-foundation models tailored for use in low-resource settings.
This is the TeenyTinyLlama pair: two compact models for Brazilian Portuguese text generation.
arXiv Detail & Related papers (2024-01-30T00:25:54Z) - MaLA-500: Massive Language Adaptation of Large Language Models [61.440556436524]
MaLA-500 is a novel large language model designed to cover an extensive range of 534 languages.
Our intrinsic evaluation demonstrates that MaLA-500 is better at predicting the given texts of low-resource languages than existing multilingual LLMs.
arXiv Detail & Related papers (2024-01-24T08:57:39Z) - SeaLLMs -- Large Language Models for Southeast Asia [76.50157503379086]
We introduce SeaLLMs, an innovative series of language models that specifically focuses on Southeast Asian (SEA) languages.
SeaLLMs are built upon the Llama-2 model and further advanced through continued pre-training with an extended vocabulary, specialized instruction and alignment tuning.
Our comprehensive evaluation demonstrates that SeaLLM-13b models exhibit superior performance across a wide spectrum of linguistic tasks and assistant-style instruction-following capabilities.
arXiv Detail & Related papers (2023-12-01T17:17:56Z) - Crosslingual Retrieval Augmented In-context Learning for Bangla [8.065775937617417]
This paper presents a pioneering approach that utilizes cross-lingual retrieval augmented in-context learning.
By strategically sourcing semantically similar prompts from high-resource language, we enable multilingual pretrained language models (MPLMs) to successfully boost performance on Bangla tasks.
Our evaluation highlights that the cross-lingual retrieval augmented prompts bring steady improvements to MPLMs over the zero-shot performance.
arXiv Detail & Related papers (2023-11-01T15:32:50Z) - Baichuan 2: Open Large-scale Language Models [51.56361715162972]
We present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.
Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
arXiv Detail & Related papers (2023-09-19T04:13:22Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Massively Multilingual Shallow Fusion with Large Language Models [62.76735265311028]
We train a single multilingual language model (LM) for shallow fusion in multiple languages.
Compared to a dense LM of similar computation during inference, GLaM reduces the WER of an English long-tail test set by 4.4% relative.
In a multilingual shallow fusion task, GLaM improves 41 out of 50 languages with an average relative WER reduction of 3.85%, and a maximum reduction of 10%.
arXiv Detail & Related papers (2023-02-17T14:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.