Fugu-MT 論文翻訳(概要): Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers

論文の概要: Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers

arxiv url: http://arxiv.org/abs/2605.05908v1
Date: Thu, 07 May 2026 09:18:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.657604
Title: Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers
Title（参考訳）: アーキテクチャに依存しないLipschitz-Constant Bayesianヘッダとその視覚変換器を用いた意味的近位分類誤差解決への応用
Authors: Frederik Schäfer, Luis Mandl, Lars Kälber, Tim Ricken,
Abstract要約: この研究は、アーキテクチャに依存しないリプシッツ・コンスタント・ベイジアンヘッダを示し、視覚変換器のような特徴抽出器に統合することができる。また,不確実性と不確かさを誤分類率で捉えるための新しい指標と,適応型算術・平均融合方式を提案する。モンテカルロサンプリングにより計算コストは上昇するが、事前に訓練されたバックボーンとのプラグ・アンド・プレイの互換性を提供する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Label noise remains a critical bottleneck for the generalization of supervised deep learning models, particularly when errors are structured rather than random. Standard robust training methods often fail in the presence of such semantically proximal classification errors. This work presents an architecture-agnostic Lipschitz-constant Bayesian header that can be integrated into feature extractors such as vision transformers, yielding the bi-Lipschitz-constrained Bayesian Vision Transformer (LipB-ViT). In contrast to conventional Bayesian layers, our approach enforces spectral normalization on both the mean and log-variance of the variational weights, which promotes calibrated predictive uncertainty and mitigates noise amplification. We further propose a novel metric to jointly capture uncertainty and confidence across misclassification rates, as well as an adaptive arithmetic-mean fusion scheme that combines feature-space proximity with predictive uncertainty to detect corrupted labels outperforming the state of the art k-nearest neighbor based identification methods by more than 7% reaching a recall of more than 0.93 at 15% semantically misclassified labels. Although computational costs increase due to Monte Carlo sampling, the method offers plug-and-play compatibility with pre-trained backbones and consistent hyperparameters across domains, suggesting strong utility for high-stakes applications with variable annotation reliability. The stabilized confidence estimates serve as the foundation for an analysis pipeline that jointly assesses dataset quality and label noise, yielding a second novel metric for their combined quantification. Lastly, we systematically evaluate LipB-ViT under both structured (adversarial) and unstructured noise at inference time, demonstrating its robustness in realistic high-noise and attack scenarios. We compare its performance against baseline methods.
Abstract（参考訳）: ラベルノイズは、特にランダムではなく、エラーが構造化されている場合において、教師付きディープラーニングモデルの一般化において重要なボトルネックとなっている。標準的な頑健な訓練手法は、意味論的に近い分類誤りの存在下で失敗することが多い。この研究は、アーキテクチャに依存しないリプシッツ・コンスタント・ベイジアン・ヘッダーを視覚変換器のような特徴抽出器に統合し、バイリプシッツ制約のベイジアン・ビジョン・トランス (LipB-ViT) を生成する。従来のベイズ層とは対照的に,本手法は変分重みの平均および対数分散のスペクトル正規化を強制し,キャリブレーション予測の不確実性を促進し,雑音増幅を緩和する。また,特徴空間近接と予測不確実性を組み合わせた適応型算術的平均融合方式を考案し,現状のk-アネレスト近傍の識別手法より7%以上優れた精度で精度よく検出し,約15%のセマンティックな誤分類ラベルで0.93以上をリコールする手法を提案する。モンテカルロサンプリングにより計算コストは増大するが、この方法は事前訓練されたバックボーンとのプラグ・アンド・プレイの互換性を提供し、ドメイン間で一貫したハイパーパラメータを提供する。安定化された信頼度推定は、データセットの品質とラベルノイズを共同で評価する分析パイプラインの基礎として機能し、それらの組み合わせの定量化のための第2の新たな指標となる。最後に,LipB-ViTを予測時(逆)と非構造雑音の両方で系統的に評価し,現実的な高雑音・攻撃シナリオにおける頑健さを実証した。その性能を基準法と比較する。

論文の概要: Architecture-agnostic Lipschitz-constant Bayesian header and its application to resolve semantically proximal classification errors with vision transformers

関連論文リスト