Bridging the gap between Human Action Recognition and Online Action
Detection
- URL: http://arxiv.org/abs/2101.08851v1
- Date: Thu, 21 Jan 2021 21:01:46 GMT
- Title: Bridging the gap between Human Action Recognition and Online Action
Detection
- Authors: Alban Main de Boissiere, Rita Noumeir
- Abstract summary: Action recognition, early prediction, and online action detection are complementary disciplines that are often studied independently.
We address the task-specific feature extraction with a teacher-student framework between the aforementioned disciplines.
Our network embeds online early prediction and online temporal segment proposalworks in parallel.
- Score: 0.456877715768796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action recognition, early prediction, and online action detection are
complementary disciplines that are often studied independently. Most online
action detection networks use a pre-trained feature extractor, which might not
be optimal for its new task. We address the task-specific feature extraction
with a teacher-student framework between the aforementioned disciplines, and a
novel training strategy. Our network, Online Knowledge Distillation Action
Detection network (OKDAD), embeds online early prediction and online temporal
segment proposal subnetworks in parallel. Low interclass and high intraclass
similarity are encouraged during teacher training. Knowledge distillation to
the OKDAD network is ensured via layer reuse and cosine similarity between
teacher-student feature vectors. Layer reuse and similarity learning
significantly improve our baseline which uses a generic feature extractor. We
evaluate our framework on infrared videos from two popular datasets, NTU RGB+D
(action recognition, early prediction) and PKU MMD (action detection). Unlike
previous attempts on those datasets, our student networks perform without any
knowledge of the future. Even with this added difficulty, we achieve
state-of-the-art results on both datasets. Moreover, our networks use infrared
from RGB-D cameras, which we are the first to use for online action detection,
to our knowledge.
Related papers
- Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Prior Knowledge Guided Network for Video Anomaly Detection [1.389970629097429]
Video Anomaly Detection (VAD) involves detecting anomalous events in videos.
We propose a Prior Knowledge Guided Network(PKG-Net) for the VAD task.
arXiv Detail & Related papers (2023-09-04T15:57:07Z) - Backdoor Attack Detection in Computer Vision by Applying Matrix
Factorization on the Weights of Deep Networks [6.44397009982949]
We introduce a novel method for backdoor detection that extracts features from pre-trained DNN's weights.
In comparison to other detection techniques, this has a number of benefits, such as not requiring any training data.
Our method outperforms the competing algorithms in terms of efficiency and is more accurate, helping to ensure the safe application of deep learning and AI.
arXiv Detail & Related papers (2022-12-15T20:20:18Z) - itKD: Interchange Transfer-based Knowledge Distillation for 3D Object
Detection [3.735965959270874]
We propose an autoencoder-style framework comprising channel-wise compression and decompression.
To learn the map-view feature of a teacher network, the features from teacher and student networks are independently passed through the shared autoencoder.
We present an head attention loss to match the 3D object detection information drawn by the multi-head self-attention mechanism.
arXiv Detail & Related papers (2022-05-31T04:25:37Z) - DIAL: Deep Interactive and Active Learning for Semantic Segmentation in
Remote Sensing [34.209686918341475]
We propose to build up a collaboration between a deep neural network and a human in the loop.
In a nutshell, the agent iteratively interacts with the network to correct its initially flawed predictions.
We show that active learning based on uncertainty estimation enables to quickly lead the user towards mistakes.
arXiv Detail & Related papers (2022-01-04T09:11:58Z) - Triggering Failures: Out-Of-Distribution detection by learning from
local adversarial attacks in Semantic Segmentation [76.2621758731288]
We tackle the detection of out-of-distribution (OOD) objects in semantic segmentation.
Our main contribution is a new OOD detection architecture called ObsNet associated with a dedicated training scheme based on Local Adversarial Attacks (LAA)
We show it obtains top performances both in speed and accuracy when compared to ten recent methods of the literature on three different datasets.
arXiv Detail & Related papers (2021-08-03T17:09:56Z) - Graph Consistency based Mean-Teaching for Unsupervised Domain Adaptive
Person Re-Identification [54.58165777717885]
This paper proposes a Graph Consistency based Mean-Teaching (GCMT) method with constructing the Graph Consistency Constraint (GCC) between teacher and student networks.
Experiments on three datasets, i.e., Market-1501, DukeMTMCreID, and MSMT17, show that proposed GCMT outperforms state-of-the-art methods by clear margin.
arXiv Detail & Related papers (2021-05-11T04:09:49Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Privileged Knowledge Distillation for Online Action Detection [114.5213840651675]
Online Action Detection (OAD) in videos is proposed as a per-frame labeling task to address the real-time prediction tasks.
This paper presents a novel learning-with-privileged based framework for online action detection where the future frames only observable at the training stages are considered as a form of privileged information.
arXiv Detail & Related papers (2020-11-18T08:52:15Z) - Semantics-aware Adaptive Knowledge Distillation for Sensor-to-Vision
Action Recognition [131.6328804788164]
We propose a framework, named Semantics-aware Adaptive Knowledge Distillation Networks (SAKDN), to enhance action recognition in vision-sensor modality (videos)
The SAKDN uses multiple wearable-sensors as teacher modalities and uses RGB videos as student modality.
arXiv Detail & Related papers (2020-09-01T03:38:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.