Open Set Action Recognition via Multi-Label Evidential Learning
- URL: http://arxiv.org/abs/2303.12698v1
- Date: Mon, 27 Feb 2023 18:34:18 GMT
- Title: Open Set Action Recognition via Multi-Label Evidential Learning
- Authors: Chen Zhao, Dawei Du, Anthony Hoogs, Christopher Funk
- Abstract summary: We propose a new method for open set action recognition and novelty detection via MUlti-Label Evidential learning (MULE)
Our Beta Evidential Neural Network estimates multi-action uncertainty with Beta densities based on actor-context-object relation representations.
Our proposed approach achieves promising performance in single/multi-actor, single/multi-action settings.
- Score: 25.15753429188536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for open-set action recognition focus on novelty detection
that assumes video clips show a single action, which is unrealistic in the real
world. We propose a new method for open set action recognition and novelty
detection via MUlti-Label Evidential learning (MULE), that goes beyond previous
novel action detection methods by addressing the more general problems of
single or multiple actors in the same scene, with simultaneous action(s) by any
actor. Our Beta Evidential Neural Network estimates multi-action uncertainty
with Beta densities based on actor-context-object relation representations. An
evidence debiasing constraint is added to the objective function for
optimization to reduce the static bias of video representations, which can
incorrectly correlate predictions and static cues. We develop a learning
algorithm based on a primal-dual average scheme update to optimize the proposed
problem. Theoretical analysis of the optimization algorithm demonstrates the
convergence of the primal solution sequence and bounds for both the loss
function and the debiasing constraint. Uncertainty and belief-based novelty
estimation mechanisms are formulated to detect novel actions. Extensive
experiments on two real-world video datasets show that our proposed approach
achieves promising performance in single/multi-actor, single/multi-action
settings.
Related papers
- Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy [12.257725479880458]
Action recognition has become one of the popular research topics in computer vision.
We propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos.
Our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets.
arXiv Detail & Related papers (2024-05-02T14:43:21Z) - Patch Spatio-Temporal Relation Prediction for Video Anomaly Detection [19.643936110623653]
Video Anomaly Detection (VAD) aims to identify abnormalities within a specific context and timeframe.
Recent deep learning-based VAD models have shown promising results by generating high-resolution frames.
We propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task.
arXiv Detail & Related papers (2024-03-28T03:07:16Z) - Algorithmic Recourse with Missing Values [11.401006371457436]
This paper proposes a new framework of algorithmic recourse (AR) that works even in the presence of missing values.
AR aims to provide a recourse action for altering the undesired prediction result given by a classifier.
Experimental results demonstrated the efficacy of our method in the presence of missing values compared to the baselines.
arXiv Detail & Related papers (2023-04-28T03:22:48Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Composed Image Retrieval with Text Feedback via Multi-grained
Uncertainty Regularization [73.04187954213471]
We introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval.
The proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline.
arXiv Detail & Related papers (2022-11-14T14:25:40Z) - CDN-MEDAL: Two-stage Density and Difference Approximation Framework for
Motion Analysis [3.337126420148156]
We propose a novel, two-stage method of change detection with two convolutional neural networks.
Our two-stage framework contains approximately 3.5K parameters in total but still maintains rapid convergence to intricate motion patterns.
arXiv Detail & Related papers (2021-06-07T16:39:42Z) - Learning Salient Boundary Feature for Anchor-free Temporal Action
Localization [81.55295042558409]
Temporal action localization is an important yet challenging task in video understanding.
We propose the first purely anchor-free temporal localization method.
Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module, and (iii) several consistency constraints.
arXiv Detail & Related papers (2021-03-24T12:28:32Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed
Videos [82.02074241700728]
In this paper, we present a prohibitive-level action recognition model that is trained with only video-frame labels.
Our method per person detectors have been trained on large image datasets within Multiple Instance Learning framework.
We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid.
arXiv Detail & Related papers (2020-07-21T10:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.