Evidential Deep Learning for Open Set Action Recognition
- URL: http://arxiv.org/abs/2107.10161v1
- Date: Wed, 21 Jul 2021 15:45:37 GMT
- Title: Evidential Deep Learning for Open Set Action Recognition
- Authors: Wentao Bao, Qi Yu, Yu Kong
- Abstract summary: We formulate the action recognition problem from the evidential deep learning (EDL) perspective.
We propose a plug-and-play module to debias the learned representation through contrastive learning.
- Score: 36.350348194248014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a real-world scenario, human actions are typically out of the distribution
from training data, which requires a model to both recognize the known actions
and reject the unknown. Different from image data, video actions are more
challenging to be recognized in an open-set setting due to the uncertain
temporal dynamics and static bias of human actions. In this paper, we propose a
Deep Evidential Action Recognition (DEAR) method to recognize actions in an
open testing set. Specifically, we formulate the action recognition problem
from the evidential deep learning (EDL) perspective and propose a novel model
calibration method to regularize the EDL training. Besides, to mitigate the
static bias of video representation, we propose a plug-and-play module to
debias the learned representation through contrastive learning. Experimental
results show that our DEAR method achieves consistent performance gain on
multiple mainstream action recognition models and benchmarks. Codes and
pre-trained weights will be made available upon paper acceptance.
Related papers
- The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks [4.971065912401385]
We propose Dual-VCLIP, a unified approach for zero-shot multi-label action recognition.
Dual-VCLIP enhances VCLIP, a zero-shot action recognition method, with the DualCoOp method for multi-label image classification.
We validate our method on the Charades dataset that includes a majority of object-based actions.
arXiv Detail & Related papers (2024-05-14T15:28:48Z) - SOAR: Scene-debiasing Open-set Action Recognition [81.8198917049666]
We propose Scene-debiasing Open-set Action Recognition (SOAR), which features an adversarial scene reconstruction module and an adaptive adversarial scene classification module.
The former prevents the decoder from reconstructing the video background given video features, and thus helps reduce the background information in feature learning.
The latter aims to confuse scene type classification given video features, with a specific emphasis on the action foreground, and helps to learn scene-invariant information.
arXiv Detail & Related papers (2023-09-03T20:20:48Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - REST: REtrieve & Self-Train for generative action recognition [54.90704746573636]
We propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition.
We show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting.
We introduce REST, a training framework consisting of two key components.
arXiv Detail & Related papers (2022-09-29T17:57:01Z) - Zero-Shot Action Recognition with Transformer-based Video Semantic
Embedding [36.24563211765782]
We take a new comprehensive look at the inductive zero-shot action recognition problem from a realistic standpoint.
Specifically, we advocate for a concrete formulation for zero-shot action recognition that avoids an exact overlap between the training and testing classes.
We propose a novel end-to-end trained transformer model which is capable of capturing long rangetemporal dependencies efficiently.
arXiv Detail & Related papers (2022-03-10T05:03:58Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.