Equivalent Classification Mapping for Weakly Supervised Temporal Action
Localization
- URL: http://arxiv.org/abs/2008.07728v2
- Date: Tue, 6 Oct 2020 02:17:18 GMT
- Title: Equivalent Classification Mapping for Weakly Supervised Temporal Action
Localization
- Authors: Tao Zhao, Junwei Han, Le Yang, Dingwen Zhang
- Abstract summary: Weakly supervised temporal action localization is a newly emerging yet widely studied topic in recent years.
The pre-classification pipeline first performs classification on each video snippet and then aggregate the snippet-level classification scores to obtain the video-level classification score.
The post-classification pipeline aggregates the snippet-level features first and then predicts the video-level classification score based on the aggregated feature.
- Score: 92.58946210982411
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised temporal action localization is a newly emerging yet widely
studied topic in recent years. The existing methods can be categorized into two
localization-by-classification pipelines, i.e., the pre-classification pipeline
and the post-classification pipeline. The pre-classification pipeline first
performs classification on each video snippet and then aggregate the
snippet-level classification scores to obtain the video-level classification
score. In contrast, the post-classification pipeline aggregates the
snippet-level features first and then predicts the video-level classification
score based on the aggregated feature. Although the classifiers in these two
pipelines are used in different ways, the role they play is exactly the
same---to classify the given features to identify the corresponding action
categories. To this end, an ideal classifier can make both pipelines work. This
inspires us to simultaneously learn these two pipelines in a unified framework
to obtain an effective classifier. Specifically, in the proposed learning
framework, we implement two parallel network streams to model the two
localization-by-classification pipelines simultaneously and make the two
network streams share the same classifier. This achieves the novel Equivalent
Classification Mapping (ECM) mechanism. Moreover, we discover that an ideal
classifier may possess two characteristics: 1) The frame-level classification
scores obtained from the pre-classification stream and the feature aggregation
weights in the post-classification stream should be consistent; 2) The
classification results of these two streams should be identical. Based on these
two characteristics, we further introduce a weight-transition module and an
equivalent training strategy into the proposed learning framework, which
assists to thoroughly mine the equivalence mechanism.
Related papers
- Revisiting Foreground and Background Separation in Weakly-supervised
Temporal Action Localization: A Clustering-based Approach [48.684550829098534]
Weakly-supervised temporal action localization aims to localize action instances in videos with only video-level action labels.
We propose a novel clustering-based F&B separation algorithm.
We evaluate our method on three benchmarks: THUMOS14, ActivityNet v1.2 and v1.3.
arXiv Detail & Related papers (2023-12-21T18:57:12Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - Hierarchical Modeling for Out-of-Scope Domain and Intent Classification [55.23920796595698]
This paper focuses on out-of-scope intent classification in dialog systems.
We propose a hierarchical multi-task learning approach based on a joint model to classify domain and intent simultaneously.
Experiments show that the model outperforms existing methods in terms of accuracy, out-of-scope recall and F1.
arXiv Detail & Related papers (2021-04-30T06:38:23Z) - Inducing a hierarchy for multi-class classification problems [11.58041597483471]
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not.
In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers.
We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.
arXiv Detail & Related papers (2021-02-20T05:40:42Z) - A Multiple Classifier Approach for Concatenate-Designed Neural Networks [13.017053017670467]
We give the design of the classifiers, which collects the features produced between the network sets.
We use the L2 normalization method to obtain the classification score instead of the Softmax Dense.
As a result, the proposed classifiers are able to improve the accuracy in the experimental cases.
arXiv Detail & Related papers (2021-01-14T04:32:40Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Learning Class Regularized Features for Action Recognition [68.90994813947405]
We introduce a novel method named Class Regularization that performs class-based regularization of layer activations.
We show that using Class Regularization blocks in state-of-the-art CNN architectures for action recognition leads to systematic improvement gains of 1.8%, 1.2% and 1.4% on the Kinetics, UCF-101 and HMDB-51 datasets, respectively.
arXiv Detail & Related papers (2020-02-07T07:27:49Z) - DNNs as Layers of Cooperating Classifiers [5.746505534720594]
A robust theoretical framework can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive.
We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks.
We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration.
arXiv Detail & Related papers (2020-01-17T07:45:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.