Fugu-MT 論文翻訳(概要): Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

論文の概要: Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

arxiv url: http://arxiv.org/abs/2604.13275v1
Date: Tue, 14 Apr 2026 20:12:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-16 20:38:32.28401
Title: Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size
Title（参考訳）: スケールによる改善と悪化 - モデルサイズによるコンテキスト適応の多様性
Authors: Dikshant Kukreja, Kshitij Sah, Gautam Gupta, Avinash Anand, Rajiv Ratn Shah, Zhengkui Wang, Aik Beng Ng, Erik Cambria,
Abstract要約: 我々は、この明らかなパラドックスを、文脈的エントレインメントのための最初のスケーリング法則によって定式化する。エントレメントは予測可能なパワーロースケーリングに従っているが、コンテキストタイプによっては逆の傾向がある。具体的には、最大のモデルは、最小の4倍の偽情報に対する耐性がある。
参考スコア（独自算出の注目度）: 44.634649562117744
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Larger language models become simultaneously better and worse at handling contextual information -- better at ignoring false claims, worse at ignoring irrelevant tokens. We formalize this apparent paradox through the first scaling laws for contextual entrainment, the tendency of models to favor tokens that appeared in context regardless of relevance. Analyzing the Cerebras-GPT (111M-13B) and Pythia (410M-12B) model families, we find entrainment follows predictable power-law scaling, but with opposite trends depending on context type: semantic contexts show decreasing entrainment with scale, while non-semantic contexts show increasing entrainment. Concretely, the largest models are four times more resistant to counterfactual misinformation than the smallest, yet simultaneously twice as prone to copying arbitrary tokens. These diverging trends, which replicate across model families, suggest that semantic filtering and mechanical copying are functionally distinct behaviors that scale in opposition -- scaling alone does not resolve context sensitivity, it reshapes it.
Abstract（参考訳）: より大きな言語モデルは、コンテキスト情報を扱うこと -- 偽のクレームを無視すること、無関係なトークンを無視すること -- において、同時に改善され、さらに悪化します。我々は、この明らかなパラドックスを、文脈的エントレインメントのための最初のスケーリング法則、関係によらず文脈に現れるトークンを好む傾向を通じて定式化する。 Cerebras-GPT (111M-13B) と Pythia (410M-12B) モデルファミリを解析したところ、エントレーニングは予測可能なパワー・ロー・スケーリングに従っているが、コンテキストによっては逆の傾向を示す。具体的には、最大のモデルは、偽情報の偽造に4倍の抵抗性を持つが、任意のトークンをコピーする傾向にある。モデルファミリ間で複製されるこれらの多様化傾向は、セマンティックフィルタリングとメカニカルコピーが、反対にスケールする機能的に異なる振る舞いであることを示唆している。

論文の概要: Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

関連論文リスト