Efficient Action Recognition Using Confidence Distillation
- URL: http://arxiv.org/abs/2109.02137v1
- Date: Sun, 5 Sep 2021 18:25:49 GMT
- Title: Efficient Action Recognition Using Confidence Distillation
- Authors: Shervin Manzuri Shalmani, Fei Chiang, Rong Zheng
- Abstract summary: We propose a confidence distillation framework to teach a representation of uncertainty of the teacher to the student sampler.
We conduct extensive experiments on three action recognition datasets and demonstrate that our framework achieves significant improvements in action recognition accuracy (up to 20%) and computational efficiency (more than 40%)
- Score: 9.028144245738247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern neural networks are powerful predictive models. However, when it comes
to recognizing that they may be wrong about their predictions, they perform
poorly. For example, for one of the most common activation functions, the ReLU
and its variants, even a well-calibrated model can produce incorrect but high
confidence predictions. In the related task of action recognition, most current
classification methods are based on clip-level classifiers that densely sample
a given video for non-overlapping, same-sized clips and aggregate the results
using an aggregation function - typically averaging - to achieve video level
predictions. While this approach has shown to be effective, it is sub-optimal
in recognition accuracy and has a high computational overhead. To mitigate both
these issues, we propose the confidence distillation framework to teach a
representation of uncertainty of the teacher to the student sampler and divide
the task of full video prediction between the student and the teacher models.
We conduct extensive experiments on three action recognition datasets and
demonstrate that our framework achieves significant improvements in action
recognition accuracy (up to 20%) and computational efficiency (more than 40%).
Related papers
- Next Best View For Point-Cloud Model Acquisition: Bayesian Approximation and Uncertainty Analysis [2.07180164747172]
This work adapts the point-net-based neural network for Next-Best-View (PC-NBV)
It incorporates dropout layers into the model's architecture, thus allowing the computation of the uncertainty estimate associated with its predictions.
The aim of the work is to improve the network's accuracy in correctly predicting the next best viewpoint.
arXiv Detail & Related papers (2024-11-04T01:32:09Z) - Adversarial Augmentation Training Makes Action Recognition Models More
Robust to Realistic Video Distribution Shifts [13.752169303624147]
Action recognition models often lack robustness when faced with natural distribution shifts between training and test data.
We propose two novel evaluation methods to assess model resilience to such distribution disparity.
We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models.
arXiv Detail & Related papers (2024-01-21T05:50:39Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - No One Representation to Rule Them All: Overlapping Features of Training
Methods [12.58238785151714]
High-performing models tend to make similar predictions regardless of training methodology.
Recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy.
We show these models specialize in generalization of the data, leading to higher ensemble performance.
arXiv Detail & Related papers (2021-10-20T21:29:49Z) - Uncertainty-sensitive Activity Recognition: a Reliability Benchmark and
the CARING Models [37.60817779613977]
We present the first study of how welthe confidence values of modern action recognition architectures indeed reflect the probability of the correct outcome.
We introduce a new approach which learns to transform the model output into realistic confidence estimates through an additional calibration network.
arXiv Detail & Related papers (2021-01-02T15:41:21Z) - Modeling Score Distributions and Continuous Covariates: A Bayesian
Approach [8.772459063453285]
We develop a generative model of the match and non-match score distributions over continuous covariates.
We use mixture models to capture arbitrary distributions and local basis functions.
Three experiments demonstrate the accuracy and effectiveness of our approach.
arXiv Detail & Related papers (2020-09-21T02:41:20Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.