Fugu-MT 論文翻訳(概要): Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework

論文の概要: Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework

arxiv url: http://arxiv.org/abs/2509.11598v2
Date: Tue, 16 Sep 2025 02:52:25 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-17 13:40:22.8764
Title: Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework
Title（参考訳）: コンテンツをスタイルからオーバーカムショートカット学習に遠ざける:ハイブリッドな生成-識別型学習フレームワーク
Authors: Siming Fu, Sijun Dong, Xiaoliang Meng,
Abstract要約: ショートカット学習は、本質的な構造の代わりにテクスチャのような表面的特徴を利用する。本稿では,明示的なコンテンツ非絡み合いを実現するハイブリッドフレームワークHyGDLを提案する。従来の方法とは異なり、この原理的な非絡み合いにより、HyGDLは真に堅牢な表現を学ぶことができる。
参考スコア（独自算出の注目度）: 4.7403081236484335
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the remarkable success of Self-Supervised Learning (SSL), its generalization is fundamentally hindered by Shortcut Learning, where models exploit superficial features like texture instead of intrinsic structure. We experimentally verify this flaw within the generative paradigm (e.g., MAE) and argue it is a systemic issue also affecting discriminative methods, identifying it as the root cause of their failure on unseen domains. While existing methods often tackle this at a surface level by aligning or separating domain-specific features, they fail to alter the underlying learning mechanism that fosters shortcut dependency.To address this at its core, we propose HyGDL (Hybrid Generative-Discriminative Learning Framework), a hybrid framework that achieves explicit content-style disentanglement. Our approach is guided by the Invariance Pre-training Principle: forcing a model to learn an invariant essence by systematically varying a bias (e.g., style) at the input while keeping the supervision signal constant. HyGDL operates on a single encoder and analytically defines style as the component of a representation that is orthogonal to its style-invariant content, derived via vector projection. This is operationalized through a synergistic design: (1) a self-distillation objective learns a stable, style-invariant content direction; (2) an analytical projection then decomposes the representation into orthogonal content and style vectors; and (3) a style-conditioned reconstruction objective uses these vectors to restore the image, providing end-to-end supervision. Unlike prior methods that rely on implicit heuristics, this principled disentanglement allows HyGDL to learn truly robust representations, demonstrating superior performance on benchmarks designed to diagnose shortcut learning.
Abstract（参考訳）: 自己監督学習(SSL)の顕著な成功にもかかわらず、その一般化は基本的にはショートカット学習によって妨げられ、モデルは本質的な構造ではなくテクスチャのような表面的特徴を利用する。生成パラダイム(例えば、MAE)におけるこの欠陥を実験的に検証し、差別的手法にも影響を及ぼす体系的な問題であり、未確認領域における障害の根本原因とみなす。既存の手法では、ドメイン固有の特徴を整列したり、分離したりすることで、表面レベルでこの問題に対処することが多いが、それらは、ショートカット依存を助長する基盤となる学習メカニズムを変えることができず、その中核となるものとして、明示的なコンテンツスタイルの歪曲を実現するハイブリッドフレームワークであるHyGDL(Hybrid Generative-Discriminative Learning Framework)を提案する。我々のアプローチは、不変事前学習原則(Invariance Pre-training Principle)によって導かれる: モデルは、監督信号の定数を維持しながら、入力におけるバイアス(例えばスタイル)を体系的に変化させることで、不変性を学ぶことを強制する。 HyGDLは単一のエンコーダ上で動作し、ベクトル射影によって導出されるスタイル不変内容に直交する表現のコンポーネントとしてスタイルを解析的に定義する。これは,(1) 自己蒸留対象が安定なスタイル不変のコンテンツ方向を学習し,(2) 解析的プロジェクションが表現を直交内容とスタイルベクトルに分解し,(3) スタイル条件の再構成対象がこれらのベクトルを用いてイメージを復元し,エンドツーエンドの監視を提供する,という相乗的設計によって操作される。暗黙のヒューリスティックスに依存する従来の手法とは異なり、この原則はHyGDLが真に堅牢な表現を学習し、ショートカット学習の診断用に設計されたベンチマークにおいて優れた性能を示す。

論文の概要: Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework

関連論文リスト