Fugu-MT 論文翻訳(概要): LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

論文の概要: LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

arxiv url: http://arxiv.org/abs/2602.12005v1
Date: Thu, 12 Feb 2026 14:37:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-13 21:07:25.870404
Title: LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss
Title（参考訳）: LaCy: 小さな言語モデルでできることとすべきことは、単に損失の問題ではない
Authors: Szilvia Ujváry, Louis Béthune, Pierre Ablin, João Monteiro, Marco Cuturi, Michael Kirchhof,
Abstract要約: 我々は、SLMがどのトークンを学べるかという問題と、どのトークンを委譲すべきかという問題について研究する。このトークン選択哲学に基づく新しい事前学習手法であるLaCyを提案する。我々の実験は、LaCyモデルがどのトークンを予測し、どのトークンをヘルプに委譲するかをうまく学べることを示した。
参考スコア（独自算出の注目度）: 34.02891591167747
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially the capacity of Small Language Models (SLMs) is limited, leading to factually incorrect generations. This problem is often mitigated by giving the SLM access to an outside source: the ability to query a larger model, documents, or a database. Under this setting, we study the fundamental question of \emph{which tokens an SLM can and should learn} during pretraining, versus \emph{which ones it should delegate} via a \texttt{<CALL>} token. We find that this is not simply a question of loss: although the loss is predictive of whether a predicted token mismatches the ground-truth, some tokens are \emph{acceptable} in that they are truthful alternative continuations of a pretraining document, and should not trigger a \texttt{<CALL>} even if their loss is high. We find that a spaCy grammar parser can help augment the loss signal to decide which tokens the SLM should learn to delegate to prevent factual errors and which are safe to learn and predict even under high losses. We propose LaCy, a novel pretraining method based on this token selection philosophy. Our experiments demonstrate that LaCy models successfully learn which tokens to predict and where to delegate for help. This results in higher FactScores when generating in a cascade with a bigger model and outperforms Rho or LLM-judge trained SLMs, while being simpler and cheaper.
Abstract（参考訳）: 言語モデルは、より多くの世界の知識をパラメータに圧縮するために一貫して成長してきたが、それらに事前訓練できる知識は、パラメータのサイズによって上限づけられている。特に、Small Language Models (SLM) の能力は限られており、事実上の誤った世代に繋がる。この問題は、より大きなモデル、ドキュメント、データベースに問い合わせる機能である外部ソースへのSLMアクセスを提供することによって、しばしば緩和される。この設定では、事前トレーニング中にSLMが可能なトークンであり、学習すべきトークンである \emph{と、 \texttt{<CALL>}トークンを介して代入すべきトークンである \emph{とを比較検討する。この損失は、予測されたトークンが真実と一致しないかどうかを予測できるが、いくつかのトークンは、事前訓練された文書の真に代替的な継続であり、もしその損失が高ければ、 \texttt{<CALL>} をトリガーするべきではないという点で \emph{acceptable} である。スパチー文法解析器は損失信号の増大に役立ち、SLMがどのトークンを委譲して実際のエラーを防止すべきかを判断し、高い損失の下でも学習し、予測することが安全であることがわかった。このトークン選択哲学に基づく新しい事前学習手法であるLaCyを提案する。我々の実験は、LaCyモデルがどのトークンを予測し、どのトークンをヘルプに委譲するかをうまく学べることを示した。これにより、より大きなモデルでカスケードを発生させ、Rho または LLM-judge が訓練した SLM より優れ、よりシンプルで安価である。

論文の概要: LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss

関連論文リスト