Fugu-MT 論文翻訳(概要): Aligning Model Properties via Conformal Risk Control

論文の概要: Aligning Model Properties via Conformal Risk Control

arxiv url: http://arxiv.org/abs/2406.18777v1
Date: Wed, 26 Jun 2024 22:24:46 GMT
ステータス: 翻訳完了
システム内更新日: 2024-06-28 15:47:01.270098
Title: Aligning Model Properties via Conformal Risk Control
Title（参考訳）: コンフォーマルリスク制御によるモデル特性の調整
Authors: William Overman, Jacqueline Jil Vallon, Mohsen Bayati,
Abstract要約: AIモデルのアライメントは、トレーニングデータにおける不注意なバイアスと、現代の機械学習における未特定パイプラインのために不可欠である。最近の進歩は、人間のフィードバックによるトレーニング後のモデルアライメントがこれらの課題のいくつかに対処できることを示している。プロパティテストを通じてモデルアライメントを解釈し、アライメントモデル $f$ を関数のサブセット $mathcalP$ に属するものとして定義する。
参考スコア（独自算出の注目度）: 4.710921988115686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: AI model alignment is crucial due to inadvertent biases in training data and the underspecified pipeline in modern machine learning, where numerous models with excellent test set metrics can be produced, yet they may not meet end-user requirements. Recent advances demonstrate that post-training model alignment via human feedback can address some of these challenges. However, these methods are often confined to settings (such as generative AI) where humans can interpret model outputs and provide feedback. In traditional non-generative settings, where model outputs are numerical values or classes, detecting misalignment through single-sample outputs is highly challenging. In this paper we consider an alternative strategy. We propose interpreting model alignment through property testing, defining an aligned model $f$ as one belonging to a subset $\mathcal{P}$ of functions that exhibit specific desired behaviors. We focus on post-processing a pre-trained model $f$ to better align with $\mathcal{P}$ using conformal risk control. Specifically, we develop a general procedure for converting queries for a given property $\mathcal{P}$ to a collection of loss functions suitable for use in a conformal risk control algorithm. We prove a probabilistic guarantee that the resulting conformal interval around $f$ contains a function approximately satisfying $\mathcal{P}$. Given the capabilities of modern AI models with extensive parameters and training data, one might assume alignment issues will resolve naturally. However, increasing training data or parameters in a random feature model doesn't eliminate the need for alignment techniques when pre-training data is biased. We demonstrate our alignment methodology on supervised learning datasets for properties like monotonicity and concavity. Our flexible procedure can be applied to various desired properties.
Abstract（参考訳）: AIモデルのアライメントは、トレーニングデータの不注意なバイアスと、優れたテストセットのメトリクスを持つ多数のモデルを生成できる現代の機械学習において、不特定パイプラインによって重要であるが、エンドユーザの要求を満たすことはできない。最近の進歩は、人間のフィードバックによるトレーニング後のモデルアライメントがこれらの課題のいくつかに対処できることを示している。しかしながら、これらの手法は、人間がモデル出力を解釈し、フィードバックを提供することができる設定(生成AIなど)に限られることが多い。モデル出力が数値値やクラスである従来の非生成的設定では、単一サンプル出力によるミスアライメントの検出は非常に困難である。本稿では,代替戦略について考察する。プロパティテストを通じてモデルアライメントを解釈し、アライメントモデル$f$を、特定の望ましい振る舞いを示す関数のサブセット$\mathcal{P}$に属するものとして定義する。我々は、共形リスク制御を用いて、事前訓練されたモデル$f$を$\mathcal{P}$に適合させるために後処理することに集中する。具体的には、所定のプロパティに対して$\mathcal{P}$のクエリを、共形リスク制御アルゴリズムでの使用に適した損失関数の集合に変換するための一般的な手順を開発する。我々は、$f$ の共形区間が $\mathcal{P}$ をほぼ満足する関数を含むという確率的保証を証明する。広範なパラメータとトレーニングデータを備えた現代のAIモデルの能力を考えると、アライメントの問題が自然に解決すると仮定される。しかし、ランダムな特徴モデルにおけるトレーニングデータやパラメータの増加は、事前トレーニングデータがバイアスを受けたときにアライメントテクニックの必要性を排除しない。単調性や凹凸性などの特性に対する教師付き学習データセットのアライメント手法を実証する。我々の柔軟な手順は、様々な望ましい性質に適用できる。

論文の概要: Aligning Model Properties via Conformal Risk Control

関連論文リスト