Fugu-MT 論文翻訳(概要): Causal methods for LLM development and evaluation

論文の概要: Causal methods for LLM development and evaluation

arxiv url: http://arxiv.org/abs/2605.25998v1
Date: Mon, 25 May 2026 16:15:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:20.461361
Title: Causal methods for LLM development and evaluation
Title（参考訳）: LLMの開発と評価のための因果的手法
Authors: Dennis Frauen, Marie Brockschmidt, Konstantin Hess, Haorui Ma, Yuchen Ma, Abdurahman Maarouf, Maresa Schröder, Jonas Schweisthal, Yuxin Wang, Athiya Deviyani, Sonali Parbhoo, Rahul G. Krishnan, Stefan Feuerriegel,
Abstract要約: 大規模言語モデル(LLM)開発は現在、データミックス、報酬モデル、ルーティング戦略、評価パイプラインに対する大規模な経験的イテレーションによって進められている。ここでは、LLM開発と評価における多くの中心的な疑問が本質的に因果関係であると主張する。我々は,LLM開発・評価パイプラインにおいて因果的手法が潜在的に不活用されていることを論じる。
参考スコア（独自算出の注目度）: 49.64304126945395
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) development is currently driven by large-scale empirical iteration over data mixtures, reward models, routing strategies, and evaluation pipelines. Here, we argue that many central questions in LLM development and evaluation are inherently causal: What is the effect of adding a data domain during pretraining? How do annotator preferences change when LLMs generate text in a different style? Should a prompt be routed to a larger or smaller model given inference cost constraints? In general, causal methods are well-suited to such settings where interventions change outcomes but, surprisingly, are underrepresented in LLM development. Our contribution is threefold: (1) We explain how causal methods can help develop modern LLM development and evaluation: LLM development relies heavily on logged data, which are often subject to confounding and distribution shifts; evaluation uses learned but potentially biased judges; and deployment environments are non-stationary. These conditions make purely predictive approaches fragile and create opportunities for principled identification and estimation methods from causal inference. (2) We further map opportunities for causal methods in the entire LLM development pipeline, including pretraining, alignment, routing, agentic workflows, and evaluation. (3) We discuss new research opportunities around leveraging causal methods for LLM development and evaluation. Overall, we argue that causal methods are potentially underutilized for the LLM development and evaluation pipeline, despite the fact that such methods can ensure a reliable and scientifically grounded design.
Abstract（参考訳）: 大規模言語モデル(LLM)開発は現在、データミックス、報酬モデル、ルーティング戦略、評価パイプラインに対する大規模な経験的イテレーションによって進められている。ここでは、LLM開発と評価における多くの中心的な疑問が本質的に因果関係であると論じている。 LLMが異なるスタイルのテキストを生成するとき、アノテーションの好みはどのように変わるのか? 推論コストの制約を条件に、プロンプトをより大きなモデルあるいは小さなモデルにルーティングすべきだろうか? 一般に、因果的手法は、介入が結果を変えるような環境に適しているが、驚くべきことに、LDM開発では不十分である。 1) 因果的手法が現代のLCM開発と評価の発展にどのように役立つかを説明する: LLM開発はログ化されたデータに大きく依存するが、しばしばコンバウンディングや分散のシフトが伴う。これらの条件は純粋に予測的アプローチを脆弱にし、因果推論から原理的同定と推定方法の機会を創出する。 2) LLM 開発パイプライン全体において,事前学習,アライメント,ルーティング,エージェントワークフロー,評価などの因果的手法の機会を更にマップする。 3) LLM 開発・評価における因果的手法の活用に関する新たな研究機会について論じる。全体としては,このような手法が信頼性と科学的根拠を持つ設計を確実にするにもかかわらず,LLM開発・評価パイプラインには因果的手法が不活用される可能性があると論じる。

論文の概要: Causal methods for LLM development and evaluation

関連論文リスト