Adversarially Robust Video Perception by Seeing Motion
- URL: http://arxiv.org/abs/2212.07815v1
- Date: Tue, 13 Dec 2022 02:25:33 GMT
- Title: Adversarially Robust Video Perception by Seeing Motion
- Authors: Lingyu Zhang, Chengzhi Mao, Junfeng Yang, Carl Vondrick
- Abstract summary: We find one reason for video models' vulnerability is that they fail to perceive the correct motion under adversarial perturbations.
Inspired by the extensive evidence that motion is a key factor for the human visual system, we propose to correct what the model sees by restoring the perceived motion information.
Our work provides new insight into robust video perception algorithms by using intrinsic structures from the data.
- Score: 29.814393563282753
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite their excellent performance, state-of-the-art computer vision models
often fail when they encounter adversarial examples. Video perception models
tend to be more fragile under attacks, because the adversary has more places to
manipulate in high-dimensional data. In this paper, we find one reason for
video models' vulnerability is that they fail to perceive the correct motion
under adversarial perturbations. Inspired by the extensive evidence that motion
is a key factor for the human visual system, we propose to correct what the
model sees by restoring the perceived motion information. Since motion
information is an intrinsic structure of the video data, recovering motion
signals can be done at inference time without any human annotation, which
allows the model to adapt to unforeseen, worst-case inputs. Visualizations and
empirical experiments on UCF-101 and HMDB-51 datasets show that restoring
motion information in deep vision models improves adversarial robustness. Even
under adaptive attacks where the adversary knows our defense, our algorithm is
still effective. Our work provides new insight into robust video perception
algorithms by using intrinsic structures from the data. Our webpage is
available at https://motion4robust.cs.columbia.edu.
Related papers
- Hawk: Learning to Understand Open-World Video Anomalies [76.9631436818573]
Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs.
We introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely.
We have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions.
arXiv Detail & Related papers (2024-05-27T07:08:58Z) - Any-point Trajectory Modeling for Policy Learning [64.23861308947852]
We introduce Any-point Trajectory Modeling (ATM) to predict future trajectories of arbitrary points within a video frame.
ATM outperforms strong video pre-training baselines by 80% on average.
We show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology.
arXiv Detail & Related papers (2023-12-28T23:34:43Z) - Exploring Human Crowd Patterns and Categorization in Video Footage for
Enhanced Security and Surveillance using Computer Vision and Machine Learning [0.0]
This paper explores computer vision's potential in security and surveillance, presenting a novel approach to track motion in videos.
By categorizing motion into Arcs, Lanes, Converging/Diverging, and Random/Block motions, the paper examines different optical flow techniques, CNN models, and machine learning models.
The results can train anomaly-detection models, provide behavioral insights based on motion, and enhance scene comprehension.
arXiv Detail & Related papers (2023-08-26T16:09:20Z) - Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for
Visible-Infrared Video Person Re-Identification [24.9205771457704]
The paper proposes a new visible-infrared video person re-ID method from a novel perspective, i.e., adversarial self-attack defense and spatial-temporal relation mining.
The proposed method exhibits compelling performance on large-scale cross-modality video datasets.
arXiv Detail & Related papers (2023-07-08T05:03:10Z) - Why is the video analytics accuracy fluctuating, and what can we do
about it? [2.0741583844039915]
It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos.
In this paper, we show that this leap of faith that deep learning models that work well on images will also work well on videos is actually flawed.
We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, the accuracy of video analytics application fluctuates noticeably.
arXiv Detail & Related papers (2022-08-23T23:16:24Z) - Temporal Shuffling for Defending Deep Action Recognition Models against
Adversarial Attacks [67.58887471137436]
We develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models.
To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.
arXiv Detail & Related papers (2021-12-15T06:57:01Z) - Boosting the Transferability of Video Adversarial Examples via Temporal
Translation [82.0745476838865]
adversarial examples are transferable, which makes them feasible for black-box attacks in real-world applications.
We introduce a temporal translation attack method, which optimize the adversarial perturbations over a set of temporal translated video clips.
Experiments on the Kinetics-400 dataset and the UCF-101 dataset demonstrate that our method can significantly boost the transferability of video adversarial examples.
arXiv Detail & Related papers (2021-10-18T07:52:17Z) - Enhancing Unsupervised Video Representation Learning by Decoupling the
Scene and the Motion [86.56202610716504]
Action categories are highly related with the scene where the action happens, making the model tend to degrade to a solution where only the scene information is encoded.
We propose to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid.
arXiv Detail & Related papers (2020-09-12T09:54:11Z) - Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z) - Motion-Excited Sampler: Video Adversarial Attack with Sparked Prior [63.11478060678794]
We propose an effective motion-excited sampler to obtain motion-aware noise prior.
By using the sparked prior in gradient estimation, we can successfully attack a variety of video classification models with fewer number of queries.
arXiv Detail & Related papers (2020-03-17T10:54:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.