Fugu-MT 論文翻訳(概要): Socio-Conformal Calibration in Complex Survey Data: Marginal Validity Is Not Enough for Subgroup Reliability

論文の概要: Socio-Conformal Calibration in Complex Survey Data: Marginal Validity Is Not Enough for Subgroup Reliability

arxiv url: http://arxiv.org/abs/2605.05562v1
Date: Thu, 07 May 2026 01:10:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.471967
Title: Socio-Conformal Calibration in Complex Survey Data: Marginal Validity Is Not Enough for Subgroup Reliability
Title（参考訳）: 複雑な調査データにおける社会・コンフォーマルな校正:サブグループの信頼性にはほど遠いマージン的妥当性
Authors: Amir Rafe, Subasish Das,
Abstract要約: 我々は,Pew American Trends Panel上での5段階のAI態度予測の順序性予測について検討した。標準コンフォメーションは4つのベース予測器全てに対して名目上の限界範囲を達成しているが、重み付けされたサブグループギャップは13ポイントである。最強の予測者(XGBoost)にとって、モンドリアンは公平性と効率性のトレードオフを悪化させる。グループしきい値を大域量子化に向けて縮小する正規化コンパレータは、この不安定性を緩和する。
参考スコア（独自算出の注目度）: 1.089614199781423
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine-learning systems used in survey-based social measurement require uncertainty estimates that are reliable across population subgroups, not merely valid in aggregate. We study ordinal conformal prediction for five-level AI-attitude forecasting on the Pew American Trends Panel (Wave 152; n=4,591; 12 race x education subgroups), comparing standard split conformal, Mondrian (group-specific) conformal, and a regularized Mondrian comparator across 100 respondent-disjoint splits with survey-weighted evaluation. Standard conformal achieves nominal marginal coverage for all four base predictors but leaves weighted subgroup gaps of ~13 percentage points. For the strongest predictor (XGBoost), Mondrian worsens the fairness-efficiency trade-off: weighted set size rises by +0.036 (dz =1.66) while the weighted subgroup gap grows by +0.013 (dz =0.30). A regularized comparator that shrinks group thresholds toward the global quantile mitigates this instability (Delta gap = -0.001, Delta size = +0.012) but does not yield a decisive fairness gain. Failure analysis traces the mechanism to calibration-cell fragmentation interacting with group-specific confidence mismatch. The negative result persists across alternate outcome codings and subgroup granularities, demonstrating that nominal marginal validity is insufficient for subgroup reliability and that naive group-specific calibration is not a dependable fairness remedy in complex survey settings.
Abstract（参考訳）: 調査に基づく社会測定で使用される機械学習システムは、人口サブグループ間で信頼性の高い不確実性推定を必要とする。我々は,Pew American Trends Panel (Wave 152; n=4,591; 12 race x education subgroups), 標準分割コンフォメーション, 標準分割コンフォメーション, 標準分割コンフォメーション, 正規化モンドリアンコンパレータを, サーベイ重み付き評価で比較した。標準等角形は4つのベース予測器全てに対して名目上の限界範囲を達成しているが、重み付けされたサブグループギャップは ~13 ポイントである。最強の予測器(XGBoost)では、モンドリアンは公平性と効率のトレードオフを悪化させ、重み付き集合のサイズは+0.036(dz =1.66)増加し、重み付き部分群ギャップは+0.013(dz =0.30)増大する。この不安定性(デルタギャップ = -0.001, デルタサイズ = +0.012)を緩和するが、決定的な公平性は得られない。故障解析は、グループ固有の信頼ミスマッチと相互作用するキャリブレーション細胞断片化のメカニズムを辿る。負の結果は、相反する結果の符号化やサブグループの粒度に留まり、サブグループの信頼性には名目的限界妥当性が不十分であり、複雑な調査環境では、単純グループ固有のキャリブレーションが信頼できるフェアネス対策ではないことを証明している。

論文の概要: Socio-Conformal Calibration in Complex Survey Data: Marginal Validity Is Not Enough for Subgroup Reliability

関連論文リスト