Bielik 11B v3: Multilingual Large Language Model for European Languages
- URL: http://arxiv.org/abs/2601.11579v1
- Date: Tue, 30 Dec 2025 18:35:15 GMT
- Title: Bielik 11B v3: Multilingual Large Language Model for European Languages
- Authors: Krzysztof Ociepa, Łukasz Flis, Remigiusz Kinas, Krzysztof Wróbel, Adrian Gwoździej,
- Abstract summary: Bielik 11B v3 is a state-of-the-art language model optimized for the Polish language.<n>It is scaled to 11B parameters via depth up-scaling.<n>It significantly surpasses other specialized Polish language models.
- Score: 1.7741953323822284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Bielik 11B v3, a state-of-the-art language model highly optimized for the Polish language, while also maintaining strong capabilities in other European languages. This model extends the Mistral 7B v0.2 architecture, scaled to 11B parameters via depth up-scaling. Its development involved a comprehensive four-stage training pipeline: continuous pre-training, supervised fine-tuning (SFT), Direct Preference Optimization (DPO), and reinforcement learning. Comprehensive evaluations demonstrate that Bielik 11B v3 achieves exceptional performance. It significantly surpasses other specialized Polish language models and outperforms many larger models (with 2-6 times more parameters) on a wide range of tasks, from basic linguistic understanding to complex reasoning. The model's parameter efficiency, combined with extensive quantization options, allows for effective deployment across diverse hardware configurations. Bielik 11B v3 not only advances AI capabilities for the Polish language but also establishes a new benchmark for developing resource-efficient, high-performance models for less-represented languages.
Related papers
- Sparse Subnetwork Enhancement for Underrepresented Languages in Large Language Models [11.719190735841407]
Large language models exhibit uneven performance across languages.<n>We present a framework for enhancing monolingual capabilities of LLMs in underrepresented languages.<n>Our approach identifies language-specific neurons using Language Activation Probability Entropy and fine-tunes only the weights associated with these neurons.
arXiv Detail & Related papers (2025-10-15T14:14:49Z) - Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters [53.59868121093848]
We introduce Seed-X, a family of open-source language models (LLMs) with 7B parameter size.<n>The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages.<n>The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs.
arXiv Detail & Related papers (2025-07-18T03:19:43Z) - Bielik v3 Small: Technical Report [0.0]
We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing.<n>These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterparts.
arXiv Detail & Related papers (2025-05-05T10:39:51Z) - Bielik 11B v2 Technical Report [0.0]
Bielik 11B v2 is a state-of-the-art language model optimized for Polish text processing.<n>It is built on the Mistral 7B v0.2 architecture and scaled to 11B parameters using depth up-scaling.<n>We introduce two key technical innovations: Weighted Instruction Cross-Entropy Loss and Adaptive Learning Rate.
arXiv Detail & Related papers (2025-05-05T07:03:41Z) - LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models [89.13128402847943]
We present LUSIFER, a novel zero-shot approach that adapts LLM-based embedding models for multilingual tasks without requiring multilingual supervision.<n>LUSIFER's architecture combines a multilingual encoder, serving as a language-universal learner, with an LLM-based embedding model optimized for embedding-specific tasks.<n>We introduce a new benchmark encompassing 5 primary embedding tasks, 123 diverse datasets, and coverage across 14 languages.
arXiv Detail & Related papers (2025-01-01T15:43:07Z) - Pipeline Analysis for Developing Instruct LLMs in Low-Resource Languages: A Case Study on Basque [2.867517731896504]
Large language models (LLMs) are typically optimized for resource-rich languages like English, exacerbating the gap between high-resource and underrepresented languages.<n>This work presents a detailed analysis of strategies for developing a model capable of following instructions in a low-resource language, specifically Basque, by focusing on three key stages: pre-training, instruction tuning, and alignment with human preferences.
arXiv Detail & Related papers (2024-12-18T15:05:59Z) - Benchmarking the Performance of Pre-trained LLMs across Urdu NLP Tasks [0.9786690381850356]
This study presents in-depth examination of 7 prominent Large Language Models (LLMs) across 17 tasks using 22 datasets, 13.8 hours of speech, in a zero-shot setting, and their performance against state-of-the-art (SOTA) models.<n>Our results emphasize that models with fewer parameters but richer language-specific data, like Llama 3.1-8B, often outperform larger models with lower language diversity, such as GPT-3.5, in several tasks.
arXiv Detail & Related papers (2024-05-24T11:30:37Z) - What Is Missing in Multilingual Visual Reasoning and How to Fix It [57.37123046817781]
We evaluate NLP models' multilingual, multimodal capabilities by testing on a visual reasoning task.<n> proprietary systems like GPT-4V obtain the best performance on this task now, but open models lag in comparison.<n>Our interventions achieve the best open performance on this task in a zero-shot setting, boosting open models LLaVA-v1.5-13B by 13.4%, LLaVA-v1.6-34B by 20.3%, and Qwen-VL by 16.7%.
arXiv Detail & Related papers (2024-03-03T05:45:27Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z) - Multilingual Speech Translation with Efficient Finetuning of Pretrained
Models [82.22294901727933]
A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability.
Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
arXiv Detail & Related papers (2020-10-24T08:15:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.