Fugu-MT 論文翻訳(概要): Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition

論文の概要: Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition

arxiv url: http://arxiv.org/abs/2505.23313v1
Date: Thu, 29 May 2025 10:17:17 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-30 18:14:07.807264
Title: Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition
Title（参考訳）: 歩行者属性認識における逆行性摂動とラベル摂動障害
Authors: Weizhe Kong, Xiao Wang, Ruichong Gao, Chenglong Li, Yu Zhang, Xing Yang, Yaowei Wang, Jin Tang,
Abstract要約: 本稿では,歩行者属性認識のための最初の対角攻撃と防御の枠組みを提案する。事前に訓練されたCLIPベースのPARフレームワークに基づいて,歩行者画像に対するグローバルおよびパッチレベルの攻撃を併用する。また、敵攻撃の影響を抑えるために、セマンティックオフセット防衛戦略を設計する。
参考スコア（独自算出の注目度）: 42.36333049201237
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pedestrian Attribute Recognition (PAR) is an indispensable task in human-centered research and has made great progress in recent years with the development of deep neural networks. However, the potential vulnerability and anti-interference ability have still not been fully explored. To bridge this gap, this paper proposes the first adversarial attack and defense framework for pedestrian attribute recognition. Specifically, we exploit both global- and patch-level attacks on the pedestrian images, based on the pre-trained CLIP-based PAR framework. It first divides the input pedestrian image into non-overlapping patches and embeds them into feature embeddings using a projection layer. Meanwhile, the attribute set is expanded into sentences using prompts and embedded into attribute features using a pre-trained CLIP text encoder. A multi-modal Transformer is adopted to fuse the obtained vision and text tokens, and a feed-forward network is utilized for attribute recognition. Based on the aforementioned PAR framework, we adopt the adversarial semantic and label-perturbation to generate the adversarial noise, termed ASL-PAR. We also design a semantic offset defense strategy to suppress the influence of adversarial attacks. Extensive experiments conducted on both digital domains (i.e., PETA, PA100K, MSP60K, RAPv2) and physical domains fully validated the effectiveness of our proposed adversarial attack and defense strategies for the pedestrian attribute recognition. The source code of this paper will be released on https://github.com/Event-AHU/OpenPAR.
Abstract（参考訳）: Pedestrian Attribute Recognition (PAR)は、人間中心の研究において必須のタスクであり、近年、ディープニューラルネットワークの開発で大きな進歩を遂げている。しかしながら、潜在的な脆弱性と反干渉能力は、まだ完全には調査されていない。このギャップを埋めるために,歩行者属性認識のための最初の敵攻撃と防御の枠組みを提案する。具体的には、事前訓練されたCLIPベースのPARフレームワークに基づいて、歩行者画像に対するグローバルレベルの攻撃とパッチレベルの攻撃の両方を利用する。まず、入力された歩行者画像を非重複パッチに分割し、プロジェクション層を使用して特徴埋め込みに埋め込みます。一方、属性セットはプロンプトを使用して文に拡張され、事前訓練されたCLIPテキストエンコーダを使用して属性機能に埋め込まれる。得られたビジョンとテキストトークンを融合させるマルチモーダルトランスフォーマーを採用し、属性認識にフィードフォワードネットワークを利用する。上述のPARフレームワークに基づいて、ASL-PARと呼ばれる対向雑音を生成するために、対向意味とラベル摂動を採用する。また、敵攻撃の影響を抑えるために、セマンティックオフセット防衛戦略を設計する。デジタルドメイン(PETA, PA100K, MSP60K, RAPv2)と物理的ドメインの両方で実施した広範囲な実験は, 歩行者属性認識のための敵攻撃と防御戦略の有効性を十分に検証した。この論文のソースコードはhttps://github.com/Event-AHU/OpenPARで公開される。

論文の概要: Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition

関連論文リスト