Fugu-MT 論文翻訳(概要): Robustness is Important: Limitations of LLMs for Data Fitting

論文の概要: Robustness is Important: Limitations of LLMs for Data Fitting

arxiv url: http://arxiv.org/abs/2508.19563v2
Date: Fri, 29 Aug 2025 13:46:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-01 13:41:09.936872
Title: Robustness is Important: Limitations of LLMs for Data Fitting
Title（参考訳）: ロバスト性 - データフィッティングのためのLLMの制限
Authors: Hejia Liu, Mochen Yang, Gediminas Adomavicius,
Abstract要約: 大規模言語モデル(LLM)は幅広い設定に適用されている。データフィッティングにLLMを使うことの重大な脆弱性を特定する。変数名を変更することで、特定の設定で予測エラーのサイズを最大82%縮小することができる。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are being applied in a wide array of settings, well beyond the typical language-oriented use cases. In particular, LLMs are increasingly used as a plug-and-play method for fitting data and generating predictions. Prior work has shown that LLMs, via in-context learning or supervised fine-tuning, can perform competitively with many tabular supervised learning techniques in terms of predictive performance. However, we identify a critical vulnerability of using LLMs for data fitting -- making changes to data representation that are completely irrelevant to the underlying learning task can drastically alter LLMs' predictions on the same data. For example, simply changing variable names can sway the size of prediction error by as much as 82% in certain settings. Such prediction sensitivity with respect to task-irrelevant variations manifests under both in-context learning and supervised fine-tuning, for both close-weight and open-weight general-purpose LLMs. Moreover, by examining the attention scores of an open-weight LLM, we discover a non-uniform attention pattern: training examples and variable names/values which happen to occupy certain positions in the prompt receive more attention when output tokens are generated, even though different positions are expected to receive roughly the same attention. This partially explains the sensitivity in the presence of task-irrelevant variations. We also consider a state-of-the-art tabular foundation model (TabPFN) trained specifically for data fitting. Despite being explicitly designed to achieve prediction robustness, TabPFN is still not immune to task-irrelevant variations. Overall, despite LLMs' impressive predictive capabilities, currently they lack even the basic level of robustness to be used as a principled data-fitting tool.
Abstract（参考訳）: 大きな言語モデル(LLM)は、一般的な言語指向のユースケースを超えて、幅広い設定で適用されています。特に、LLMはデータを取り付けて予測を生成するためのプラグ・アンド・プレイ法として、ますます使われている。従来の研究によると、LLMは文脈内学習や教師付き微調整を通じて、予測性能の点で多くの表付き教師付き学習技術と競争的に機能する。しかし、データフィッティングにLLMを使うことの致命的な脆弱性を識別する -- 基礎となる学習タスクとは全く無関係なデータ表現の変更は、LLMの予測を同じデータで劇的に変更する可能性がある。例えば、変数名の変更は特定の設定で最大82%の精度で予測エラーのサイズを縮めることができる。このようなタスク非関連変動に対する予測感度は、近重量LLMとオープンウェイト汎用LLMの両方に対して、コンテキスト内学習と教師付き微調整の両方で現れる。さらに,オープンウェイトLDMの注意点を調べることで,異なる位置がほぼ同じ注意を受けることを期待されても,出力トークンが生成されると,プロンプト内の特定の位置を占めるような訓練例や変数名/値がより注目されるという,一様でない注意パターンを発見する。これは、タスク非関連なバリエーションが存在する場合の感度を部分的に説明します。また、データフィッティングに特化して訓練された最先端の表層基礎モデル(TabPFN)についても検討する。予測ロバスト性を達成するために明示的に設計されたにもかかわらず、TabPFNは依然としてタスク非関連なバリエーションに免疫がない。全体として、LLMの印象的な予測機能にもかかわらず、現時点では、原則化されたデータ適合ツールとして使用される基本レベルの堅牢性さえも欠如している。

論文の概要: Robustness is Important: Limitations of LLMs for Data Fitting

関連論文リスト