Fugu-MT 論文翻訳(概要): Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing

論文の概要: Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing

arxiv url: http://arxiv.org/abs/2505.09484v1
Date: Wed, 14 May 2025 15:36:44 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-15 21:44:09.513291
Title: Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing
Title（参考訳）: 装飾とアライメント:マルチモーダル・フェイス・アンチ・スプーフィングのためのドメインの一般化を再考する
Authors: Yingjie Ma, Xun Lin, Zitong Yu, Xin Liu, Xiaochen Yuan, Weicheng Xie, Linlin Shen,
Abstract要約: Face Anti-Spoofing (FAS) は、多様なシナリオにおける顔認識システムのセキュリティに不可欠である。我々はtextbfMultitextbfmodal textbfDenoising と textbfAlignment (textbfMMDA) フレームワークを紹介する。 CLIPのゼロショット一般化機能を活用することで、MMDAフレームワークはマルチモーダルデータのノイズを効果的に抑制する。
参考スコア（独自算出の注目度）: 47.24147617685829
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face Anti-Spoofing (FAS) is essential for the security of facial recognition systems in diverse scenarios such as payment processing and surveillance. Current multimodal FAS methods often struggle with effective generalization, mainly due to modality-specific biases and domain shifts. To address these challenges, we introduce the \textbf{M}ulti\textbf{m}odal \textbf{D}enoising and \textbf{A}lignment (\textbf{MMDA}) framework. By leveraging the zero-shot generalization capability of CLIP, the MMDA framework effectively suppresses noise in multimodal data through denoising and alignment mechanisms, thereby significantly enhancing the generalization performance of cross-modal alignment. The \textbf{M}odality-\textbf{D}omain Joint \textbf{D}ifferential \textbf{A}ttention (\textbf{MD2A}) module in MMDA concurrently mitigates the impacts of domain and modality noise by refining the attention mechanism based on extracted common noise features. Furthermore, the \textbf{R}epresentation \textbf{S}pace \textbf{S}oft (\textbf{RS2}) Alignment strategy utilizes the pre-trained CLIP model to align multi-domain multimodal data into a generalized representation space in a flexible manner, preserving intricate representations and enhancing the model's adaptability to various unseen conditions. We also design a \textbf{U}-shaped \textbf{D}ual \textbf{S}pace \textbf{A}daptation (\textbf{U-DSA}) module to enhance the adaptability of representations while maintaining generalization performance. These improvements not only enhance the framework's generalization capabilities but also boost its ability to represent complex representations. Our experimental results on four benchmark datasets under different evaluation protocols demonstrate that the MMDA framework outperforms existing state-of-the-art methods in terms of cross-domain generalization and multimodal detection accuracy. The code will be released soon.
Abstract（参考訳）: Face Anti-Spoofing (FAS) は、支払い処理や監視といった様々なシナリオにおける顔認識システムのセキュリティに不可欠である。現在のマルチモーダルFAS法は、主にモダリティ固有のバイアスとドメインシフトのために、効果的な一般化に苦しむことが多い。これらの課題に対処するため、我々は \textbf{M}ulti\textbf{m}odal \textbf{D}enoising と \textbf{A}lignment (\textbf{MMDA}) フレームワークを紹介した。 MMDAフレームワークは,CLIPのゼロショット一般化機能を活用することにより,デノナイズとアライメント機構を通じてマルチモーダルデータのノイズを効果的に抑制し,クロスモーダルアライメントの一般化性能を大幅に向上させる。 MMDA における \textbf{M}odality-\textbf{D}omain Joint \textbf{D}ifferential \textbf{A}ttention (\textbf{MD2A}) モジュールは、抽出された共通の雑音特徴に基づいて注意機構を洗練することにより、ドメインおよびモードノイズの影響を同時に緩和する。さらに、\textbf{R}epresentation \textbf{S}pace \textbf{S}oft (\textbf{RS2}) アライメント戦略は、事前訓練されたCLIPモデルを用いて、マルチドメインのマルチモーダルデータをフレキシブルな方法で一般化表現空間に整列させ、複雑な表現を保存し、様々な未知の状態へのモデルの適応性を高める。また、一般化性能を維持しつつ表現の適応性を高めるために、 \textbf{U}-shaped \textbf{D}ual \textbf{S}pace \textbf{A}daptation (\textbf{U-DSA})モジュールを設計する。これらの改善により、フレームワークの一般化能力が向上するだけでなく、複雑な表現の表現能力も向上する。評価プロトコルが異なる4つのベンチマークデータセットに対する実験結果から,MMDAフレームワークは,クロスドメインの一般化とマルチモーダル検出精度において,既存の最先端手法よりも優れていることが示された。コードはまもなくリリースされる。

関連論文リスト

Adapformer: Adaptive Channel Management for Multivariate Time Series Forecasting [49.40321003932633]
Adapformerは、効果的なチャネル管理を通じてCIとCD方法論のメリットをマージする、トランスフォーマーベースの高度なフレームワークである。 Adapformerは既存のモデルよりも優れた性能を実現し、予測精度と計算効率の両方を向上させる。
論文参考訳（メタデータ） (2025-11-18T16:24:05Z)
Learning Representation and Synergy Invariances: A Povable Framework for Generalized Multimodal Face Anti-Spoofing [85.00865662325954]
複数の視覚的モダリティを統合するマルチモーダル・フェイス・アンチ・スプーフィング(FAS)法は、目に見えないドメインにデプロイすると、より深刻なパフォーマンス劣化を被ることが多い。これは主に、クロスドメインのマルチモーダル一般化に影響を与える2つの見落とされがちなリスクによるものである。証明可能なフレームワーク,すなわちマルチモーダル表現と相乗的不変学習(RiSe)を提案する。
論文参考訳（メタデータ） (2025-11-18T05:37:06Z)
UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation [104.59740403500132]
マルチモーダルイメージセグメンテーションは、不完全/破損したモダリティの劣化による実際のデプロイメント課題に直面している。階層型自己教師型補償(HSSC)による統一Modality-relaxセグメンテーションネットワーク(UniMRSeg)を提案する。我々のアプローチは、入力レベル、特徴レベル、出力レベルをまたいだ完全なモダリティと不完全なモダリティの間の表現ギャップを階層的に橋渡しします。
論文参考訳（メタデータ） (2025-09-19T17:29:25Z)
Domain Generalized Stereo Matching with Uncertainty-guided Data Augmentation [11.938635624781313]
State-of-the-art stereo matching (SM)モデルはドメインの違いにより実際のデータドメインに一般化できないことが多い。データ拡張を活用してトレーニングドメインを拡張し、堅牢なクロスドメイン特徴表現を取得するようモデルに促します。私たちのアプローチはシンプルでアーキテクチャに依存しないもので、任意のSMネットワークに統合することができます。
論文参考訳（メタデータ） (2025-08-02T10:26:53Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequence textbfRecommendation)を提案する。 Stein kernel-based Integrated Information Coordination Module (IICM) は理論上、マルチモーダル特徴とIDストリーム間の分散一貫性を保証する。マルチモーダル特徴を文脈的関連性に基づいて適応的にフィルタリング・結合するクロスモーダル・エキスパート・ルーティング機構。
論文参考訳（メタデータ） (2025-07-07T04:09:45Z)
BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation [55.486872677160015]
マスクレベルの分類タスクとしてマルチモーダルなセマンティックセグメンテーションを再構成する。統一モダリティマッチング(UMM)とクロスモダリティアライメント(CMA)を統合したBiXFormerを提案する。合成および実世界のマルチモーダルベンチマーク実験により,本手法の有効性を実証した。
論文参考訳（メタデータ） (2025-06-04T08:04:58Z)
Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
textbfActivation-Guided textbfConsensus textbfMerging(textbfACM)は,層固有のマージ係数を決定するプラグインとプレイのマージフレームワークである。 L2S(Long-to-Short)と一般的なマージタスクの実験は、ACMが全てのベースラインメソッドを一貫して上回ることを示した。
論文参考訳（メタデータ） (2025-05-20T07:04:01Z)
Noise Optimized Conditional Diffusion for Domain Adaptation [7.414646586981638]
Pseudo-labelingはUnsupervised Domain Adaptation(UDA)の基盤である textbfNoise textbfOptimized textbfConditional textbfDiffusion for textbfDomain textbfAdaptation (textbfNOCDDA)を提案する。
論文参考訳（メタデータ） (2025-05-12T13:28:31Z)
DADM: Dual Alignment of Domain and Modality for Face Anti-spoofing [58.62312400472865]
マルチモーダル・フェイス・アンチ・スプーフィング (FAS) が顕著な研究対象となっている。相互情報に基づくモダリティ間のアライメントモジュールを提案する。サブドメイン超平面とモダリティ角マージンの両方を整列する双対アライメント最適化法を用いる。
論文参考訳（メタデータ） (2025-03-01T10:12:00Z)
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models [58.936893810674896]
顔認識システムのセキュリティと信頼性を確保するためには,FAS(Face Anti-Spoofing)が不可欠である。 I-FAS(Interpretable Face Anti-Spoofing)と呼ばれるFASのためのマルチモーダルな大規模言語モデルフレームワークを提案する。本稿では,FAS画像の高品質なキャプションを生成するために,Spof-Aware Captioning and Filtering(SCF)戦略を提案する。
論文参考訳（メタデータ） (2025-01-03T09:25:04Z)
Towards Robust Online Domain Adaptive Semantic Segmentation under Adverse Weather Conditions [43.58583290714884]
textbfRobust textbfOnline textbfDomain textbfAdaptive textbfSemantic textbfSegmentation framework。提案手法は,約40フレーム/秒(FPS)を維持しながら,広く使用されているOnDAベンチマークの最先端手法より優れている。
論文参考訳（メタデータ） (2024-09-02T08:53:08Z)
Enhancing Multimodal Unified Representations for Cross Modal Generalization [52.16653133604068]
我々は、コードブック(TOC)のトレーニング不要最適化と、FCID(Fin and Coarse Cross-modal Information Disentangling)を提案する。これらの方法は、各モードの特定の特性に合わせて、事前学習から統一された離散表現を洗練し、きめ細かな情報と粗い情報の絡み合わせを行う。
論文参考訳（メタデータ） (2024-03-08T09:16:47Z)
TeG-DG: Textually Guided Domain Generalization for Face Anti-Spoofing [8.830873674673828]
既存の方法は、様々な訓練領域からドメイン不変の特徴を抽出することを目的としている。抽出された特徴は、必然的に残差スタイルの特徴バイアスを含んでおり、その結果、一般化性能が劣る。本稿では,テキスト情報をドメイン間アライメントに有効活用するテキストガイド型ドメイン一般化フレームワークを提案する。
論文参考訳（メタデータ） (2023-11-30T10:13:46Z)
Posterior Differential Regularization with f-divergence for Improving Model Robustness [95.05725916287376]
クリーン入力とノイズ入力のモデル後部差を規則化する手法に着目する。後微分正則化を$f$-divergencesの族に一般化する。実験の結果, 後方微分を$f$-divergenceで正規化することで, モデルロバスト性の向上が期待できることがわかった。
論文参考訳（メタデータ） (2020-10-23T19:58:01Z)
Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy [77.34280933613226]
我々は、ネットワークの計算において非局所的な表現を行うtextbfPatch-level Neighborhood Interpolation(Pani)と呼ばれる一般的な正規化器を提案する。提案手法は,異なる層にパッチレベルグラフを明示的に構築し,その近傍のパッチ特徴を線形に補間し,汎用的で効果的な正規化戦略として機能する。
論文参考訳（メタデータ） (2019-11-21T06:31:59Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。