Fugu-MT 論文翻訳(概要): Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

論文の概要: Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)

arxiv url: http://arxiv.org/abs/2307.01225v1
Date: Mon, 3 Jul 2023 03:17:20 GMT
ステータス: 翻訳完了
システム内更新日: 2023-07-06 19:33:27.210638
Title: Interpretability and Transparency-Driven Detection and Transformation of Textual Adversarial Examples (IT-DT)
Title（参考訳）: テクスト逆数例(IT-DT)の解釈可能性と透明性駆動検出と変換
Authors: Bushra Sabir, M. Ali Babar, Sharif Abuadbba
Abstract要約: 本稿では,IT-DT(Interpretability and Transparency-Driven Detection and Transformation)フレームワークを提案する。テキストの敵対的な例を検出し、変換する際の解釈可能性と透明性に焦点を当てている。 IT-DTは、逆攻撃に対する変換器ベースのテキスト分類器のレジリエンスと信頼性を大幅に向上させる。
参考スコア（独自算出の注目度）: 0.5729426778193399
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performance in NLP. However, their vulnerability to adversarial examples poses a security risk. Existing defense methods lack interpretability, making it hard to understand adversarial classifications and identify model vulnerabilities. To address this, we propose the Interpretability and Transparency-Driven Detection and Transformation (IT-DT) framework. It focuses on interpretability and transparency in detecting and transforming textual adversarial examples. IT-DT utilizes techniques like attention maps, integrated gradients, and model feedback for interpretability during detection. This helps identify salient features and perturbed words contributing to adversarial classifications. In the transformation phase, IT-DT uses pre-trained embeddings and model feedback to generate optimal replacements for perturbed words. By finding suitable substitutions, we aim to convert adversarial examples into non-adversarial counterparts that align with the model's intended behavior while preserving the text's meaning. Transparency is emphasized through human expert involvement. Experts review and provide feedback on detection and transformation results, enhancing decision-making, especially in complex scenarios. The framework generates insights and threat intelligence empowering analysts to identify vulnerabilities and improve model robustness. Comprehensive experiments demonstrate the effectiveness of IT-DT in detecting and transforming adversarial examples. The approach enhances interpretability, provides transparency, and enables accurate identification and successful transformation of adversarial inputs. By combining technical analysis and human expertise, IT-DT significantly improves the resilience and trustworthiness of transformer-based text classifiers against adversarial attacks.
Abstract（参考訳）: BERT、Roberta、T5、GPT-3などのトランスフォーマーベースのテキスト分類器は、NLPで顕著な性能を示している。しかし、敵の例に対する脆弱性はセキュリティリスクを引き起こす。既存の防御方法は解釈性に欠けており、敵の分類を理解し、モデルの脆弱性を特定するのが困難である。そこで本稿では,IT-DT(Interpretability and Transparency-Driven Detection and Transformation)フレームワークを提案する。テキストの逆例の検出と変換において、解釈可能性と透明性に焦点を当てている。 IT-DTは、アテンションマップ、統合された勾配、モデルフィードバックなどの技術を使用して、検出時の解釈可能性を向上させる。これは、敵の分類に寄与する有能な特徴や摂動語を特定するのに役立つ。トランスフォーメーションフェーズでは、IT-DTはトレーニング済みの埋め込みとモデルフィードバックを使用して、摂動単語の最適な置換を生成する。適切な置換を見出すことにより,テキストの意味を保ちながら,モデルが意図する振る舞いと一致した,敵対的な例を非敵対的な事例に変換することを目指す。透明性は専門家の関与を通じて強調される。専門家は、特に複雑なシナリオにおいて、検出と変換の結果をレビューし、フィードバックします。このフレームワークは洞察と脅威知性を生成し、アナリストに脆弱性を特定し、モデルの堅牢性を改善する。総合的な実験は、敵のサンプルの検出と変換におけるIT-DTの有効性を示す。このアプローチは解釈可能性を高め、透明性を提供し、敵入力の正確な識別と変換を成功させる。技術的分析と人間の専門知識を組み合わせることで、IT-DTは、逆攻撃に対するトランスフォーマーベースのテキスト分類器のレジリエンスと信頼性を大幅に改善する。

関連論文リスト

On the Mechanisms of Adversarial Data Augmentation for Robust and Adaptive Transfer Learning [0.0]
移動学習環境における強靭性と適応性を両立させる上で, ADA(Adversarial Data Augmentation)の役割について検討した。本稿では、ADAと整合性正規化とドメイン不変表現学習を統合した統合フレームワークを提案する。本研究は,破壊攻撃からの摂動を,ドメイン間移動性のための正規化力に変換する,対向学習という構成的視点を強調した。
論文参考訳（メタデータ） (2025-05-19T03:56:51Z)
Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack [51.16384207202798]
視覚言語事前学習モデルは多モーダル逆例(AE)に対して脆弱である従来のアプローチでは、画像とテキストのペアを拡大して、敵対的なサンプル生成プロセス内での多様性を高めている。本稿では, 敵の多様性を高めるために, クリーン, ヒストリ, および現在の敵の例からなる敵の進化三角形からのサンプリングを提案する。
論文参考訳（メタデータ） (2024-11-04T23:07:51Z)
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
既存のメソッドは通常、ターゲットテキストを分離して分析するか、非メンバーコンテキストでのみ分析する。 Con-ReCallは、メンバと非メンバのコンテキストによって誘導される非対称な分布シフトを利用する新しいアプローチである。
論文参考訳（メタデータ） (2024-09-05T09:10:38Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
ホワイトボックスの敵攻撃とは対照的に、転送攻撃は現実世界のシナリオをより反映している。本稿では,SA-Attackと呼ばれる自己拡張型転送攻撃手法を提案する。
論文参考訳（メタデータ） (2023-12-08T09:08:50Z)
Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation [66.33340583035374]
本研究は, ラウンドトリップ翻訳における現在のテキスト対逆攻撃の堅牢性に関する包括的研究である。筆者らは,現在最先端のテキストベースの6つの敵攻撃が,ラウンドトリップ翻訳後の有効性を維持していないことを実証した。本稿では,機械翻訳を逆例生成のプロセスに組み込むことにより,この問題に対する介入に基づく解決策を提案する。
論文参考訳（メタデータ） (2023-07-24T04:29:43Z)
In and Out-of-Domain Text Adversarial Robustness via Label Smoothing [64.66809713499576]
多様なNLPタスクの基本モデルにおいて,ラベルの平滑化戦略によって提供される対角的ロバスト性について検討する。実験の結果,ラベルのスムース化は,BERTなどの事前学習モデルにおいて,様々な攻撃に対して,逆方向の堅牢性を大幅に向上させることがわかった。また,予測信頼度とロバスト性の関係を解析し,ラベルの平滑化が敵の例に対する過度な信頼誤差を減少させることを示した。
論文参考訳（メタデータ） (2022-12-20T14:06:50Z)
Estimating the Adversarial Robustness of Attributions in Text with Transformers [44.745873282080346]
リプシッツ連続性に基づくテキスト分類における帰属ロバスト性(AR)の新たな定義を確立する。そこで我々は,テキスト分類における属性の厳密な推定を行う強力な敵であるTransformerExplanationAttack (TEA)を提案する。
論文参考訳（メタデータ） (2022-12-18T20:18:59Z)
Disentangled Text Representation Learning with Information-Theoretic Perspective for Adversarial Robustness [17.5771010094384]
敵の脆弱性は信頼性の高いNLPシステムを構築する上で大きな障害である。最近の研究は、モデルの敵意的な脆弱性は教師あり訓練における非破壊的な特徴によって引き起こされると主張している。本稿では,不整合表現学習の観点から,敵対的課題に取り組む。
論文参考訳（メタデータ） (2022-10-26T18:14:39Z)
Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations [2.543865489517869]
本研究は、説明の忠実さに触発された新たな評価手法の基盤を築き、テキストの反事実を動機づけるものである。感情分析データを用いた実験では, 両モデルとも, 対物関係の関連性は明らかでないことがわかった。
論文参考訳（メタデータ） (2022-10-17T09:50:02Z)
Semantically Distributed Robust Optimization for Vision-and-Language Inference [34.83271008148651]
分散ロバスト最適化設定における言語変換をモデルに依存しない手法である textbfSDRO を提案する。画像とビデオによるベンチマークデータセットの実験では、パフォーマンスの改善に加えて、敵攻撃に対する堅牢性も示されている。
論文参考訳（メタデータ） (2021-10-14T06:02:46Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。