Related papers: Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation

URL: http://arxiv.org/abs/2410.18565v1
Date: Thu, 24 Oct 2024 09:16:09 GMT
Title: Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation
Authors: Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej, Remigiusz Kinas,
Abstract summary: Bielik 7B v0.1 is a generative text model for Polish language processing. It addresses key challenges in language model development through innovative techniques. It demonstrates significant improvements, achieving a 9 percentage point increase in average score compared to Mistral-7B-v0.1 on the RAG Reader task. It also excels in the Polish MT-Bench, particularly in Reasoning (6.15/10) and Role-playing (7.83/10) categories.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Bielik 7B v0.1, a 7-billion-parameter generative text model for Polish language processing. Trained on curated Polish corpora, this model addresses key challenges in language model development through innovative techniques. These include Weighted Instruction Cross-Entropy Loss, which balances the learning of different instruction types, and Adaptive Learning Rate, which dynamically adjusts the learning rate based on training progress. To evaluate performance, we created the Open PL LLM Leaderboard and Polish MT-Bench, novel frameworks assessing various NLP tasks and conversational abilities. Bielik 7B v0.1 demonstrates significant improvements, achieving a 9 percentage point increase in average score compared to Mistral-7B-v0.1 on the RAG Reader task. It also excels in the Polish MT-Bench, particularly in Reasoning (6.15/10) and Role-playing (7.83/10) categories. This model represents a substantial advancement in Polish language AI, offering a powerful tool for diverse linguistic applications and setting new benchmarks in the field.

Related papers

Bielik v3 Small: Technical Report [0.0]
We introduce Bielik v3, a series of parameter-efficient generative text models (1.5B and 4.5B) optimized for Polish language processing.<n>These models demonstrate that smaller, well-optimized architectures can achieve performance comparable to much larger counterparts.
arXiv Detail & Related papers (2025-05-05T10:39:51Z)
Bielik 11B v2 Technical Report [0.0]
Bielik 11B v2 is a state-of-the-art language model optimized for Polish text processing.<n>It is built on the Mistral 7B v0.2 architecture and scaled to 11B parameters using depth up-scaling.<n>We introduce two key technical innovations: Weighted Instruction Cross-Entropy Loss and Adaptive Learning Rate.
arXiv Detail & Related papers (2025-05-05T07:03:41Z)
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities [2.047424180164312]
Training large language models (LLMs) in low-resource languages such as Hebrew poses unique challenges. We introduce DictaLM2.0 and DictaLM2.0-Instruct, two LLMs trained on a substantial corpus of approximately 200 billion tokens in both Hebrew and English.
arXiv Detail & Related papers (2024-07-09T17:51:37Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language [0.1534667887016089]
We introduce a few-shot benchmark consisting of 7 different classification tasks native to the Polish language. We conducted an empirical comparison with 0 and 16 shots between fine-tuning, linear probing, SetFit, and in-context learning (ICL) using various pre-trained commercial and open-source models. ICL achieves the best performance, with commercial models like GPT-3.5 and GPT-4 attaining the best performance.
arXiv Detail & Related papers (2024-04-27T08:53:58Z)
Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish [0.0]
This study explores the potential of fine-tuning foundational English Large Language Models (LLMs) for generating Polish text. The first step involves Language Adaptive Pre-training (LAPT) on a high-quality dataset of 3.11 GB, consisting of 276 million Polish tokens. Our trained model Curie-7B-v1 not only generates Polish text with the lowest perplexity of 3.02 among decoder-based Polish models but also closely rivals the performance of the best Polish encoder-decoder models.
arXiv Detail & Related papers (2024-02-15T07:17:10Z)
Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z)
MasakhaNEWS: News Topic Classification for African languages [15.487928928173098]
African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. We develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa.
arXiv Detail & Related papers (2023-04-19T21:12:23Z)
Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z)
Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish. The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering. We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z)
Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models [53.95094814056337]
This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements. We provide the integration of Russian SuperGLUE with a framework for industrial evaluation of the open-source models, MOROCCO.
arXiv Detail & Related papers (2022-02-15T23:45:30Z)
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model. vanilla embedding sharing in ELECTRA hurts training efficiency and model performance. We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z)
Pre-training Polish Transformer-based Language Models at Scale [1.0312968200748118]
We present two language models for Polish based on the popular BERT architecture. We describe our methodology for collecting the data, preparing the corpus, and pre-training the model. We then evaluate our models on thirteen Polish linguistic tasks, and demonstrate improvements in eleven of them.
arXiv Detail & Related papers (2020-06-07T18:48:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.