Exploiting Multi-Object Relationships for Detecting Adversarial Attacks
in Complex Scenes
- URL: http://arxiv.org/abs/2108.08421v1
- Date: Thu, 19 Aug 2021 00:52:10 GMT
- Title: Exploiting Multi-Object Relationships for Detecting Adversarial Attacks
in Complex Scenes
- Authors: Mingjun Yin, Shasha Li, Zikui Cai, Chengyu Song, M. Salman Asif, Amit
K. Roy-Chowdhury, and Srikanth V. Krishnamurthy
- Abstract summary: Vision systems that deploy Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples.
Recent research has shown that checking the intrinsic consistencies in the input data is a promising way to detect adversarial attacks.
We develop a novel approach to perform context consistency checks using language models.
- Score: 51.65308857232767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision systems that deploy Deep Neural Networks (DNNs) are known to be
vulnerable to adversarial examples. Recent research has shown that checking the
intrinsic consistencies in the input data is a promising way to detect
adversarial attacks (e.g., by checking the object co-occurrence relationships
in complex scenes). However, existing approaches are tied to specific models
and do not offer generalizability. Motivated by the observation that language
descriptions of natural scene images have already captured the object
co-occurrence relationships that can be learned by a language model, we develop
a novel approach to perform context consistency checks using such language
models. The distinguishing aspect of our approach is that it is independent of
the deployed object detector and yet offers very high accuracy in terms of
detecting adversarial examples in practical scenes with multiple objects.
Related papers
- Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection [37.57355457749918]
We introduce a novel framework for zero-shot HOI detection using Conditional Multi-Modal Prompts, namely CMMP.
Unlike traditional prompt-learning methods, we propose learning decoupled vision and language prompts for interactiveness-aware visual feature extraction.
Experiments demonstrate the efficacy of our detector with conditional multi-modal prompts, outperforming previous state-of-the-art on unseen classes of various zero-shot settings.
arXiv Detail & Related papers (2024-08-05T14:05:25Z) - Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection.
Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering.
We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Few-shot Object Detection with Self-adaptive Attention Network for
Remote Sensing Images [11.938537194408669]
We propose a few-shot object detector which is designed for detecting novel objects provided with only a few examples.
In order to fit the object detection settings, our proposed few-shot detector concentrates on the relations that lie in the level of objects instead of the full image.
The experiments demonstrate the effectiveness of the proposed method in few-shot scenes.
arXiv Detail & Related papers (2020-09-26T13:44:58Z) - Visual Relationship Detection with Visual-Linguistic Knowledge from
Multimodal Representations [103.00383924074585]
Visual relationship detection aims to reason over relationships among salient objects in images.
We propose a novel approach named Visual-Linguistic Representations from Transformers (RVL-BERT)
RVL-BERT performs spatial reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training.
arXiv Detail & Related papers (2020-09-10T16:15:09Z) - Connecting the Dots: Detecting Adversarial Perturbations Using Context
Inconsistency [25.039201331256372]
We augment the Deep Neural Network with a system that learns context consistency rules during training and checks for the violations of the same during testing.
Our approach builds a set of auto-encoders, one for each object class, appropriately trained so as to output a discrepancy between the input and output if an added adversarial perturbation violates context consistency rules.
Experiments on PASCAL VOC and MS COCO show that our method effectively detects various adversarial attacks and achieves high ROC-AUC (over 0.95 in most cases)
arXiv Detail & Related papers (2020-07-19T19:46:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.