Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
- URL: http://arxiv.org/abs/2403.20254v1
- Date: Fri, 29 Mar 2024 16:01:00 GMT
- Title: Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
- Authors: Runhao Zeng, Xiaoyong Chen, Jiaming Liang, Huisi Wu, Guangzhong Cao, Yong Guo,
- Abstract summary: Temporal action detection (TAD) aims to locate action positions and recognize action categories in long-term untrimmed videos.
We observe that temporal information in videos can be occasionally corrupted, such as missing or blurred frames.
In this paper, we extensively analyze the robustness of seven leading TAD methods and obtain some interesting findings.
- Score: 21.031546310537692
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Temporal action detection (TAD) aims to locate action positions and recognize action categories in long-term untrimmed videos. Although many methods have achieved promising results, their robustness has not been thoroughly studied. In practice, we observe that temporal information in videos can be occasionally corrupted, such as missing or blurred frames. Interestingly, existing methods often incur a significant performance drop even if only one frame is affected. To formally evaluate the robustness, we establish two temporal corruption robustness benchmarks, namely THUMOS14-C and ActivityNet-v1.3-C. In this paper, we extensively analyze the robustness of seven leading TAD methods and obtain some interesting findings: 1) Existing methods are particularly vulnerable to temporal corruptions, and end-to-end methods are often more susceptible than those with a pre-trained feature extractor; 2) Vulnerability mainly comes from localization error rather than classification error; 3) When corruptions occur in the middle of an action instance, TAD models tend to yield the largest performance drop. Besides building a benchmark, we further develop a simple but effective robust training method to defend against temporal corruptions, through the FrameDrop augmentation and Temporal-Robust Consistency loss. Remarkably, our approach not only improves robustness but also yields promising improvements on clean data. We believe that this study will serve as a benchmark for future research in robust video analysis. Source code and models are available at https://github.com/Alvin-Zeng/temporal-robustness-benchmark.
Related papers
- Benchmarking the Robustness of Optical Flow Estimation to Corruptions [25.789811424859554]
We introduce 7 temporal corruptions specifically designed for the benchmarking of optical flow models.
Two robustness benchmarks, KITTI-FC and GoPro-FC, are established as the first corruption benchmark for optical flow estimation.
29 model variants from 15 optical flow methods are evaluated, yielding 10 intriguing observations.
arXiv Detail & Related papers (2024-11-22T11:31:01Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Robo3D: Towards Robust and Reliable 3D Perception against Corruptions [58.306694836881235]
We present Robo3D, the first comprehensive benchmark heading toward probing the robustness of 3D detectors and segmentors under out-of-distribution scenarios.
We consider eight corruption types stemming from severe weather conditions, external disturbances, and internal sensor failure.
We propose a density-insensitive training framework along with a simple flexible voxelization strategy to enhance the model resiliency.
arXiv Detail & Related papers (2023-03-30T17:59:17Z) - A Comprehensive Study on Robustness of Image Classification Models:
Benchmarking and Rethinking [54.89987482509155]
robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts.
We establish a comprehensive benchmark robustness called textbfARES-Bench on the image classification task.
By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness.
arXiv Detail & Related papers (2023-02-28T04:26:20Z) - Real-Time Driver Monitoring Systems through Modality and View Analysis [28.18784311981388]
Driver distractions are known to be the dominant cause of road accidents.
State-of-the-art methods prioritize accuracy while ignoring latency.
We propose time-effective detection models by neglecting the temporal relation between video frames.
arXiv Detail & Related papers (2022-10-17T21:22:41Z) - ReAct: Temporal Action Detection with Relational Queries [84.76646044604055]
This work aims at advancing temporal action detection (TAD) using an encoder-decoder framework with action queries.
We first propose a relational attention mechanism in the decoder, which guides the attention among queries based on their relations.
Lastly, we propose to predict the localization quality of each action query at inference in order to distinguish high-quality queries.
arXiv Detail & Related papers (2022-07-14T17:46:37Z) - Benchmarking and Analyzing Point Cloud Classification under Corruptions [33.252032774949356]
We benchmark and analyze point cloud classification under corruptions.
Based on the obtained observations, we propose several effective techniques to enhance point cloud robustness.
arXiv Detail & Related papers (2022-02-07T17:50:21Z) - Benchmarking the Robustness of Spatial-Temporal Models Against
Corruptions [32.821121530785504]
We establish a corruption robustness benchmark, Mini Kinetics-C and Mini SSV2-C, which considers temporal corruptions beyond spatial corruptions in images.
We make the first attempt to conduct an exhaustive study on the corruption robustness of established CNN-based and Transformer-based spatial-temporal models.
arXiv Detail & Related papers (2021-10-13T05:59:39Z) - Adversarial Robustness under Long-Tailed Distribution [93.50792075460336]
Adversarial robustness has attracted extensive studies recently by revealing the vulnerability and intrinsic characteristics of deep networks.
In this work we investigate the adversarial vulnerability as well as defense under long-tailed distributions.
We propose a clean yet effective framework, RoBal, which consists of two dedicated modules, a scale-invariant and data re-balancing.
arXiv Detail & Related papers (2021-04-06T17:53:08Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.