Fugu-MT 論文翻訳(概要): EuroLLM-22B: Technical Report

論文の概要: EuroLLM-22B: Technical Report

arxiv url: http://arxiv.org/abs/2602.05879v1
Date: Thu, 05 Feb 2026 16:53:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-06 18:49:09.062357
Title: EuroLLM-22B: Technical Report
Title（参考訳）: EuroLLM-22B 技術報告
Authors: Miguel Moura Ramos, Duarte M. Alves, Hippolyte Gisserot-Boukhlef, João Alves, Pedro Henrique Martins, Patrick Fernandes, José Pombal, Nuno M. Guerreiro, Ricardo Rei, Nicolas Boizard, Amin Farajian, Mateusz Klimaszewski, José G. C. de Souza, Barry Haddow, François Yvon, Pierre Colombo, Alexandra Birch, André F. T. Martins,
Abstract要約: EuroLLM-22Bは、ヨーロッパ市民のニーズに対応するためにゼロから訓練された大きな言語モデルである。欧州連合の公式言語24か国語および追加言語11か国語をカバーしている。
参考スコア（独自算出の注目度）: 84.29719676524947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This report presents EuroLLM-22B, a large language model trained from scratch to support the needs of European citizens by covering all 24 official European Union languages and 11 additional languages. EuroLLM addresses the issue of European languages being underrepresented and underserved in existing open large language models. We provide a comprehensive overview of EuroLLM-22B's development, including tokenizer design, architectural specifications, data filtering, and training procedures. Across a broad set of multilingual benchmarks, EuroLLM-22B demonstrates strong performance in reasoning, instruction following, and translation, achieving results competitive with models of comparable size. To support future research, we release our base and instruction-tuned models, our multilingual web pretraining data and updated EuroBlocks instruction datasets, as well as our pre-training and evaluation codebases.
Abstract（参考訳）: 本報告では、EUの24の公用語と11の追加言語をカバーすることで、欧州市民のニーズを支援するために、ゼロから訓練された大規模な言語モデルであるEuroLLM-22Bを提示する。 EuroLLMは、ヨーロッパの言語が既存のオープンな大規模言語モデルで不足し、保存されていない問題に対処する。本稿では,EuroLLM-22Bの開発の概要を概説する。トークン化設計,アーキテクチャ仕様,データフィルタリング,トレーニング手順などである。広範囲にわたる多言語ベンチマークにおいて、EuroLLM-22Bは推論、命令追従、翻訳において強力な性能を示し、同等の大きさのモデルと競合する結果を達成している。今後の研究を支援するため、ベースモデルと命令チューニングモデル、多言語Web事前学習データ、EuroBlocks命令データセットの更新、および事前学習および評価コードベースをリリースする。

論文の概要: EuroLLM-22B: Technical Report

関連論文リスト