Knowledge Distillation for Action Anticipation via Label Smoothing
- URL: http://arxiv.org/abs/2004.07711v2
- Date: Fri, 18 Dec 2020 13:28:40 GMT
- Title: Knowledge Distillation for Action Anticipation via Label Smoothing
- Authors: Guglielmo Camporese, Pasquale Coscia, Antonino Furnari, Giovanni Maria
Farinella, Lamberto Ballan
- Abstract summary: Human capability to anticipate near future from visual observations and non-verbal cues is essential for developing intelligent systems.
We implement a multi-modal framework based on long short-term memory (LSTM) networks to summarize past observations and make predictions at different time steps.
Experiments show that label smoothing systematically improves performance of state-of-the-art models for action anticipation.
- Score: 21.457069042129138
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human capability to anticipate near future from visual observations and
non-verbal cues is essential for developing intelligent systems that need to
interact with people. Several research areas, such as human-robot interaction
(HRI), assisted living or autonomous driving need to foresee future events to
avoid crashes or help people. Egocentric scenarios are classic examples where
action anticipation is applied due to their numerous applications. Such
challenging task demands to capture and model domain's hidden structure to
reduce prediction uncertainty. Since multiple actions may equally occur in the
future, we treat action anticipation as a multi-label problem with missing
labels extending the concept of label smoothing. This idea resembles the
knowledge distillation process since useful information is injected into the
model during training. We implement a multi-modal framework based on long
short-term memory (LSTM) networks to summarize past observations and make
predictions at different time steps. We perform extensive experiments on
EPIC-Kitchens and EGTEA Gaze+ datasets including more than 2500 and 100 action
classes, respectively. The experiments show that label smoothing systematically
improves performance of state-of-the-art models for action anticipation.
Related papers
- From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation [30.161471749050833]
We propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR)
ARR decomposes the action anticipation task into action recognition and reasoning tasks, and effectively learns the statistical relationship between actions by next action prediction (NAP)
In addition, to address the challenge of relationship modeling that requires extensive training data, we propose an innovative approach for the unsupervised pre-training of the decoder.
arXiv Detail & Related papers (2024-08-05T18:38:29Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding
Object Articulations from Interactions [62.510951695174604]
"Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR) is a probabilistic generative framework that generates hypotheses about how objects articulate given input observations.
We show that the proposed model significantly outperforms the current state-of-the-art articulated object manipulation framework.
We further improve the test-time efficiency of H-SAUR by integrating a learned prior from learning-based vision models.
arXiv Detail & Related papers (2022-10-22T18:39:33Z) - A-ACT: Action Anticipation through Cycle Transformations [89.83027919085289]
We take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms.
A recent study on human psychology explains that, in anticipating an occurrence, the human brain counts on both systems.
In this work, we study the impact of each system for the task of action anticipation and introduce a paradigm to integrate them in a learning framework.
arXiv Detail & Related papers (2022-04-02T21:50:45Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - A Framework for Multisensory Foresight for Embodied Agents [11.351546861334292]
Predicting future sensory states is crucial for learning agents such as robots, drones, and autonomous vehicles.
In this paper, we couple multiple sensory modalities with exploratory actions and propose a predictive neural network architecture to address this problem.
The framework was tested and validated with a dataset containing 4 sensory modalities (vision, haptic, audio, and tactile) on a humanoid robot performing 9 behaviors multiple times on a large set of objects.
arXiv Detail & Related papers (2021-09-15T20:20:04Z) - Multi-level Motion Attention for Human Motion Prediction [132.29963836262394]
We study the use of different types of attention, computed at joint, body part, and full pose levels.
Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2021-06-17T08:08:11Z) - AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory
Prediction [30.61190086847564]
We propose a generative architecture for multi-future trajectory predictions based on Conditional Variational Recurrent Neural Networks (C-VRNNs)
Human interactions are modeled with a graph-based attention mechanism enabling an online attentive hidden state refinement of the recurrent estimation.
arXiv Detail & Related papers (2020-05-17T17:21:23Z) - A Spatial-Temporal Attentive Network with Spatial Continuity for
Trajectory Prediction [74.00750936752418]
We propose a novel model named spatial-temporal attentive network with spatial continuity (STAN-SC)
First, spatial-temporal attention mechanism is presented to explore the most useful and important information.
Second, we conduct a joint feature sequence based on the sequence and instant state information to make the generative trajectories keep spatial continuity.
arXiv Detail & Related papers (2020-03-13T04:35:50Z) - A Novel Graph based Trajectory Predictor with Pseudo Oracle [15.108410951760131]
GTPPO is an encoder-decoder-based method conditioned on pedestrians' future behaviors.
It is evaluated on ETH, UCY and Stanford Drone datasets.
arXiv Detail & Related papers (2020-02-02T13:40:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.