Interpretable Deep Feature Propagation for Early Action Recognition
- URL: http://arxiv.org/abs/2107.05122v1
- Date: Sun, 11 Jul 2021 19:40:19 GMT
- Title: Interpretable Deep Feature Propagation for Early Action Recognition
- Authors: He Zhao, Richard P. Wildes
- Abstract summary: In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space.
We work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout.
We employ a Kalman filter to combat error build-up and unify across prediction start times.
- Score: 39.966828592322315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early action recognition (action prediction) from limited preliminary
observations plays a critical role for streaming vision systems that demand
real-time inference, as video actions often possess elongated temporal spans
which cause undesired latency. In this study, we address action prediction by
investigating how action patterns evolve over time in a spatial feature space.
There are three key components to our system. First, we work with
intermediate-layer ConvNet features, which allow for abstraction from raw data,
while retaining spatial layout. Second, instead of propagating features per se,
we propagate their residuals across time, which allows for a compact
representation that reduces redundancy. Third, we employ a Kalman filter to
combat error build-up and unify across prediction start times. Extensive
experimental results on multiple benchmarks show that our approach leads to
competitive performance in action prediction. Notably, we investigate the
learned components of our system to shed light on their otherwise opaque
natures in two ways. First, we document that our learned feature propagation
module works as a spatial shifting mechanism under convolution to propagate
current observations into the future. Thus, it captures flow-based image motion
information. Second, the learned Kalman filter adaptively updates prior
estimation to aid the sequence learning process.
Related papers
- Exploring Temporally-Aware Features for Point Tracking [58.63091479730935]
Chrono is a feature backbone specifically designed for point tracking with built-in temporal awareness.
Chrono achieves state-of-the-art performance in a refiner-free setting on the TAP-Vid-DAVIS and TAP-Vid-Kinetics datasets.
arXiv Detail & Related papers (2025-01-21T15:39:40Z) - Learning Temporal Cues by Predicting Objects Move for Multi-camera 3D Object Detection [9.053936905556204]
We propose a model called DAP (Detection After Prediction), consisting of a two-branch network.
The features predicting the current objects from branch (i) is fused into branch (ii) to transfer predictive knowledge.
Our model can be used plug-and-play, showing consistent performance gain.
arXiv Detail & Related papers (2024-04-02T02:20:47Z) - STARFlow: Spatial Temporal Feature Re-embedding with Attentive Learning for Real-world Scene Flow [5.476991379461233]
We propose global attentive flow embedding to match all-to-all point pairs in both Euclidean space.
We leverage novel domain adaptive losses to bridge the gap of motion inference from synthetic to real-world.
Our approach achieves state-of-the-art performance across various datasets, with particularly outstanding results on real-world LiDAR-scanned datasets.
arXiv Detail & Related papers (2024-03-11T04:56:10Z) - Triplet Attention Transformer for Spatiotemporal Predictive Learning [9.059462850026216]
We propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features.
The model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions.
arXiv Detail & Related papers (2023-10-28T12:49:33Z) - Uncovering the Missing Pattern: Unified Framework Towards Trajectory
Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences.
Current methods often assume that the observed sequences are complete while ignoring the potential for missing values.
This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z) - Learning Appearance-motion Normality for Video Anomaly Detection [11.658792932975652]
We propose spatial-temporal memories augmented two-stream auto-encoder framework.
It learns the appearance normality and motion normality independently and explores the correlations via adversarial learning.
Our framework outperforms the state-of-the-art methods, achieving AUCs of 98.1% and 89.8% on UCSD Ped2 and CUHK Avenue datasets.
arXiv Detail & Related papers (2022-07-27T08:30:19Z) - ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal
Feature Learning [132.20119288212376]
We propose a spatial-temporal feature learning scheme towards a set of more representative features for perception, prediction and planning tasks simultaneously.
To the best of our knowledge, we are the first to systematically investigate each part of an interpretable end-to-end vision-based autonomous driving system.
arXiv Detail & Related papers (2022-07-15T16:57:43Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - A Prospective Study on Sequence-Driven Temporal Sampling and Ego-Motion
Compensation for Action Recognition in the EPIC-Kitchens Dataset [68.8204255655161]
Action recognition is one of the top-challenging research fields in computer vision.
ego-motion recorded sequences have become of important relevance.
The proposed method aims to cope with it by estimating this ego-motion or camera motion.
arXiv Detail & Related papers (2020-08-26T14:44:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.