Fugu-MT 論文翻訳(概要): An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

論文の概要: An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

arxiv url: http://arxiv.org/abs/2604.02596v1
Date: Fri, 03 Apr 2026 00:13:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.253254
Title: An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages
Title（参考訳）: 低リソース言語の機械翻訳における多ショットインコンテキスト学習の実証的研究
Authors: Yinhan Lu, Gaganpreet Jhajj, Chen Zhang, Anietie Andy, David Ifeoluwa Adelani,
Abstract要約: In-context Learning (ICL) は、大規模言語モデルがいくつかの例から新しいタスクに適応できるようにする。マルチショットICLに関する最近の研究は、現代のLLMが長いコンテキストウィンドウによって実現されたより大きなICLの例からさらに恩恵を受けることを示唆している。本報告では, 機械翻訳のための多言語ICLを, 真に低リソースな10言語に翻訳する実験的検討を行った。
参考スコア（独自算出の注目度）: 10.94905796367051
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks from a few examples, making it promising for languages underrepresented in pre-training. Recent work on many-shot ICL suggests that modern LLMs can further benefit from larger ICL examples enabled by their long context windows. However, such gains depend on careful example selection, and the inference cost can be prohibitive for low-resource language communities. In this paper, we present an empirical study of many-shot ICL for machine translation from English into ten truly low-resource languages recently added to FLORES+. We analyze the effects of retrieving more informative examples, using out-of-domain data, and ordering examples by length. Our findings show that many-shot ICL becomes more effective as the number of examples increases. More importantly, we show that BM25-based retrieval substantially improves data efficiency: 50 retrieved examples roughly match 250 many-shot examples, while 250 retrieved examples perform similarly to 1,000 many-shot examples.
Abstract（参考訳）: In-context Learning (ICL)は、大規模言語モデル(LLM)がいくつかの例から新しいタスクに適応できるようにする。マルチショットICLに関する最近の研究は、現代のLLMが長いコンテキストウィンドウによって可能となるより大きなICLの例からさらに恩恵を受けることを示唆している。しかし、このような利得は慎重な例の選択に依存するため、低リソースの言語コミュニティでは推論コストが禁止される可能性がある。本稿では、最近FLORES+に追加された10の真に低リソース言語への機械翻訳のための多ショットICLの実証的研究について述べる。ドメイン外データを用いて、より情報に富んだサンプルを検索し、長さでサンプルを注文する効果を解析する。症例数の増加に伴い多発性ICLが有効であることが示唆された。さらに, BM25をベースとした検索ではデータ効率が大幅に向上し, 50例は250例と大まかに一致し, 250例は1,000例と類似していることがわかった。

論文の概要: An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

関連論文リスト