Fugu-MT 論文翻訳(概要): Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them

論文の概要: Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them

arxiv url: http://arxiv.org/abs/2604.04518v1
Date: Mon, 06 Apr 2026 08:29:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:19.146603
Title: Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them
Title（参考訳）: Spurious correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness の発見法と修正法に関する研究
Authors: Ole Delzer, Sidney Bender,
Abstract要約: 我々は、一般的な非XAIベースラインと並行して、説明可能な人工知能(XAI)技術に基づく補正手法の評価を行った。 XAIに基づく手法は一般に非XAI手法よりも優れていた。実験では、多くのメソッドの実践的応用がグループラベルへの依存によって妨げられていることも明らかにした。
参考スコア（独自算出の注目度）: 0.8594140167290097
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks (DNNs) are increasingly utilized in high-stakes domains like medical diagnostics and autonomous driving where model reliability is critical. However, the research landscape for ensuring this reliability is terminologically fractured across communities that pursue the same goal of ensuring models rely on causally relevant features rather than confounding signals. While frameworks such as distributionally robust optimization (DRO), invariant risk minimization (IRM), shortcut learning, simplicity bias, and the Clever Hans effect all address model failure due to spurious correlations, researchers typically only reference work within their own domains. This reproducibility study unifies these perspectives through a comparative analysis of correction methods under challenging constraints like limited data availability and severe subgroup imbalance. We evaluate recently proposed correction methods based on explainable artificial intelligence (XAI) techniques alongside popular non-XAI baselines using both synthetic and real-world datasets. Findings show that XAI-based methods generally outperform non-XAI approaches, with Counterfactual Knowledge Distillation (CFKD) proving most consistently effective at improving generalization. Our experiments also reveal that the practical application of many methods is hindered by a dependency on group labels, as manual annotation is often infeasible and automated tools like Spectral Relevance Analysis (SpRAy) struggle with complex features and severe imbalance. Furthermore, the scarcity of minority group samples in validation sets renders model selection and hyperparameter tuning unreliable, posing a significant obstacle to the deployment of robust and trustworthy models in safety-critical areas.
Abstract（参考訳）: ディープニューラルネットワーク(DNN)は、モデルの信頼性が不可欠である医療診断や自律運転といった高度な領域で、ますます活用されている。しかし、この信頼性を確保するための研究の展望は、同じ目的を追求するコミュニティ間で用語的に破壊され、モデルが信号の混在よりも因果関係のある特徴に頼っていることを保証している。分散ロバスト最適化(DRO)、不変リスク最小化(IRM)、ショートカット学習(英語版)、単純さバイアス(英語版)、Clever Hans(英語版)といったフレームワークは、全てのアドレスモデル失敗を刺激するが、研究者は通常、自身のドメイン内でのみ作業を参照する。この再現性の研究は、限られたデータ可用性や厳密なサブグループ不均衡といった厳しい制約の下での補正手法の比較分析を通じて、これらの視点を統一する。我々は最近,合成データセットと実世界のデータセットの両方を用いて,一般的な非XAIベースラインと並行して,説明可能な人工知能(XAI)技術に基づく補正手法の評価を行った。その結果,XAI法は一般に非XAI法よりも優れており,CFKD法は一般化向上に最も有効であることが示唆された。また,多くの手法の実践的適用は,手動アノテーションがしばしば実現不可能であり,スペクトル関連分析(SpRAy)のような自動ツールが複雑な特徴と深刻な不均衡に苦しむため,グループラベルへの依存によって妨げられていることも明らかにした。さらに、バリデーションセットにおける少数群のサンプルの不足は、モデル選択とハイパーパラメータチューニングの信頼性を低下させ、安全クリティカルな領域における堅牢で信頼性の高いモデルの配置に重大な障害を生じさせる。

論文の概要: Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them

関連論文リスト