Fugu-MT 論文翻訳(概要): Capabilities Ain't All You Need: Measuring Propensities in AI

論文の概要: Capabilities Ain't All You Need: Measuring Propensities in AI

arxiv url: http://arxiv.org/abs/2602.18182v1
Date: Fri, 20 Feb 2026 12:40:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-23 18:01:41.326385
Title: Capabilities Ain't All You Need: Measuring Propensities in AI
Title（参考訳）: AIの重要度を計測する能力は必要なものばかりではない
Authors: Daniel Romero-Alvarado, Fernando Martínez-Plumed, Lorenzo Pacchiardi, Hugo Save, Siddhesh Milind Pawar, Behzad Mehrbakhsh, Pablo Antonio Moreno Casares, Ben Slater, Paolo Bova, Peter Romero, Zachary R. Tyler, Jonathan Prunty, Luning Sun, Jose Hernandez-Orallo,
Abstract要約: 本稿では,モデル成功のためのバイオロジカルな定式化を用いて,AIの正当性を測定するための最初の公式なフレームワークを紹介する。私たちは、どの程度の確率がシフトしているか、これがタスクにどんな影響を及ぼすかを測定することができることに気付きました。我々は、それぞれ別々に比較して、妥当性と能力を組み合わせる際に、より強い予測力を得る。
参考スコア（独自算出の注目度）: 32.960519634809145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI evaluation has primarily focused on measuring capabilities, with formal approaches inspired from Item Response Theory (IRT) being increasingly applied. Yet propensities - the tendencies of models to exhibit particular behaviours - play a central role in determining both performance and safety outcomes. However, traditional IRT describes a model's success on a task as a monotonic function of model capabilities and task demands, an approach unsuited to propensities, where both excess and deficiency can be problematic. Here, we introduce the first formal framework for measuring AI propensities by using a bilogistic formulation for model success, which attributes high success probability when the model's propensity is within an "ideal band". Further, we estimate the limits of the ideal band using LLMs equipped with newly developed task-agnostic rubrics. Applying our framework to six families of LLM models whose propensities are incited in either direction, we find that we can measure how much the propensity is shifted and what effect this has on the tasks. Critically, propensities estimated using one benchmark successfully predict behaviour on held-out tasks. Moreover, we obtain stronger predictive power when combining propensities and capabilities than either separately. More broadly, our framework showcases how rigorous propensity measurements can be conducted and how it yields gains over solely using capability evaluations to predict AI behaviour.
Abstract（参考訳）: AI評価は主に測定機能に重点を置いており、IRT(Item Response Theory)から着想を得た正式なアプローチが採用されている。しかし、特定の振る舞いを示すモデルの傾向である確率は、パフォーマンスと安全性の両方の結果を決定する上で中心的な役割を果たす。しかし、従来のIRTは、モデル機能とタスク要求の単調な機能として、タスクにおけるモデルの成功を説明している。本稿では,モデル成功のためのバイオロジカルな定式化を用いて,AIの正当性を測定するための最初の形式的枠組みを紹介する。さらに,新たに開発されたタスク非依存ルーブリックを備えたLSMを用いて,理想的なバンドの限界を推定する。どちらの方向にも確率が誘導されるLLMモデルの6つのファミリに我々のフレームワークを適用することで、その確率がどの程度シフトするか、これがタスクに与える影響を計測できることがわかった。臨界的に、あるベンチマークを用いて推定された確率は、ホールドアウトタスクの振る舞いをうまく予測する。さらに,両者を別々に比較した場合よりも,確率と能力を組み合わせる場合に強い予測力が得られる。より広範に、我々のフレームワークは、厳密な確率測定がどのように実施され、それがAIの振る舞いを予測するのに能力評価だけを使用して利益を得るかを示す。

論文の概要: Capabilities Ain't All You Need: Measuring Propensities in AI

関連論文リスト