Fugu-MT 論文翻訳(概要): AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals

論文の概要: AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals

arxiv url: http://arxiv.org/abs/2605.04083v1
Date: Wed, 15 Apr 2026 17:35:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 06:56:26.583011
Title: AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals
Title（参考訳）: AsymmetricZero:人間専門家の選好を意味的方程式として操作するためのフレームワーク
Authors: Tadhg Looram, Lucas Nuzzi, Kyle Waters, Steven Dillmann,
Abstract要約: AsymmetricZeroは、人間の専門的嗜好を意味論的評価として運用するためのフレームワークである。 AsymmetricZeroは、各タスクを安定した評価契約として表現し、グレーディング基準を明確にする。本稿では,ハーバーを用いた5モデルフロンティア陪審を5モデルコンパクト陪審と比較する。
参考スコア（独自算出の注目度）: 0.0044302156879028705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Much of the focus in RL today is on evaluation design: building meaningful evals that serve simultaneously as benchmarks and as well-defined reward signals for post-training. Yet, many real-world tasks are governed by subjective, procedural, and domain-specific requirements that are difficult to encode as exact-match targets or open-ended preference judgments frequently used in RL pipelines today. In this work, we present AsymmetryZero, a framework for operationalizing human expert preferences as semantic evals. AsymmetryZero represents each task as a stable evaluation contract that makes grading criteria explicit: what is being graded, how each criterion is judged, and how criterion-level decisions are aggregated into a task outcome. The same contract can be executed using Inspect for model-only evaluations, as well as the Harbor Framework for agentic evaluations, enabling comparable scores and shared audit artifacts across both settings. We argue that the central challenge in post-training today is the faithful encoding of expert requirements into the evaluation itself. To that end, we present a study using Harbor that holds task contracts fixed and compares a five-model frontier jury against a five-model compact jury across four frontier-class solvers (Claude Opus 4.6, GPT-5.4, Grok-4.20, Gemini-3.1-Pro). We find that criterion-level frontier-vs-compact agreement ranges from $75.9\%$ to $89.6\%$ (strict common-subset agreement: $77.8\%$ to $92.1\%$), while compact juries exhibit substantially higher internal dissent (3--2 split rate $28.7\%$--$32.4\%$) than frontier juries ($6.1\%$--$11.5\%$). Verifier traces further show that compact juries reduce per-criterion judging cost to roughly $4.2\%$--$5.6\%$ of frontier and latency to roughly $21.7\%$--$27.1\%$, even as aggregated task-level outcomes often remain comparatively stable.
Abstract（参考訳）: 現在のRLの焦点は評価設計であり、ベンチマークとトレーニング後の報奨信号とを同時に使用する有意義なevalの構築である。しかし、多くの実世界のタスクは主観的、手続き的、ドメイン固有の要件によって管理されており、今日ではRLパイプラインで頻繁に使用される正確なマッチターゲットやオープンな優先判断として符号化するのが困難である。本研究では,人間の専門的嗜好を意味論的評価として運用するためのフレームワークであるAsymmetricZeroを紹介する。 AsymmetricZeroは、各タスクを安定した評価契約として表現し、グレーティング基準を明確にする。同じコントラクトはモデルのみの評価にInspectを使用して実行することができ、エージェント評価のためのHarbor Frameworkを使用して、両方の設定で同等のスコアと監査成果物を共有することができる。今日のポストトレーニングにおける中心的な課題は、専門家の要求を評価そのものに忠実にエンコーディングすることである、と私たちは主張する。この目的のために,ハーバーを用いた5モデルフロンティア陪審を4つのフロンティアクラスの解決者(Claude Opus 4.6, GPT-5.4, Grok-4.20, Gemini-3.1-Pro)の5モデルコンパクト陪審と比較した。基準レベルのFrontier-vs-compact agreementは75.9\%から89.6\%(制限付きコモン・サブセット契約:77.8\%$から92.1\%$)まで様々であり、一方コンパクト・ジャリーはフロンティア・ジャリー(6.1\%--$11.5\%$)よりもかなり高い内部不一致(28.7\%)を示す。検証結果から、コンパクトなジャリーは基準ごとの判定コストを約4.2 %$--$5.6 %$のフロンティアとレイテンシを約21.7 %$--$27.1 %$に削減している。

論文の概要: AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals

関連論文リスト