LAP-Net: Adaptive Features Sampling via Learning Action Progression for
Online Action Detection
- URL: http://arxiv.org/abs/2011.07915v1
- Date: Mon, 16 Nov 2020 13:08:47 GMT
- Title: LAP-Net: Adaptive Features Sampling via Learning Action Progression for
Online Action Detection
- Authors: Sanqing Qu, Guang Chen, Dan Xu, Jinhu Dong, Fan Lu, Alois Knoll
- Abstract summary: Action detection is a task with the aim of identifying ongoing actions from streaming videos without any side information or access to future frames.
Recent methods proposed to aggregate fixed temporal ranges of invisible but anticipated future frames representations as supplementary features.
We introduce an adaptive features sampling strategy to overcome the variable-ranges of optimal supplementary features.
- Score: 13.205827952845201
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online action detection is a task with the aim of identifying ongoing actions
from streaming videos without any side information or access to future frames.
Recent methods proposed to aggregate fixed temporal ranges of invisible but
anticipated future frames representations as supplementary features and
achieved promising performance. They are based on the observation that human
beings often detect ongoing actions by contemplating the future vision
simultaneously. However, we observed that at different action progressions, the
optimal supplementary features should be obtained from distinct temporal ranges
instead of simply fixed future temporal ranges. To this end, we introduce an
adaptive features sampling strategy to overcome the mentioned variable-ranges
of optimal supplementary features. Specifically, in this paper, we propose a
novel Learning Action Progression Network termed LAP-Net, which integrates an
adaptive features sampling strategy. At each time step, this sampling strategy
first estimates current action progression and then decide what temporal ranges
should be used to aggregate the optimal supplementary features. We evaluated
our LAP-Net on three benchmark datasets, TVSeries, THUMOS-14 and HDD. The
extensive experiments demonstrate that with our adaptive feature sampling
strategy, the proposed LAP-Net can significantly outperform current
state-of-the-art methods with a large margin.
Related papers
- Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization [20.268572246761895]
We propose to locate the temporal boundaries of each action and predict action class in untrimmed videos.
Faster-TAD simplifies the pipeline of TAD and gets remarkable performance.
arXiv Detail & Related papers (2024-10-31T14:16:56Z) - Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition [14.97527336050901]
We propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR)
It incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings.
Experimental results on five FSAR datasets demonstrate that our method set a new benchmark, beating the second-best competitors with large margins.
arXiv Detail & Related papers (2024-08-22T15:13:27Z) - From Recognition to Prediction: Leveraging Sequence Reasoning for Action Anticipation [30.161471749050833]
We propose a novel end-to-end video modeling architecture that utilizes attention mechanisms, named Anticipation via Recognition and Reasoning (ARR)
ARR decomposes the action anticipation task into action recognition and reasoning tasks, and effectively learns the statistical relationship between actions by next action prediction (NAP)
In addition, to address the challenge of relationship modeling that requires extensive training data, we propose an innovative approach for the unsupervised pre-training of the decoder.
arXiv Detail & Related papers (2024-08-05T18:38:29Z) - Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain.
This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation.
We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z) - Enhancing End-to-End Autonomous Driving with Latent World Model [78.22157677787239]
We propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels.
Our framework textbfLAW uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame.
As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
arXiv Detail & Related papers (2024-06-12T17:59:21Z) - EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks [14.046487518350792]
Spiking Neural Networks (SNNs) operate on an event-driven through sparse spike communication.
We introduce Residual Potential Dropout (RPD) and Spike-Aware Training (SAT) to regulate potential distribution.
Our method yields a 4.4% mAP improvement on the Gen1 dataset, while requiring 38% fewer parameters and only three time steps.
arXiv Detail & Related papers (2024-03-19T09:34:11Z) - Embedded feature selection in LSTM networks with multi-objective
evolutionary ensemble learning for time series forecasting [49.1574468325115]
We present a novel feature selection method embedded in Long Short-Term Memory networks.
Our approach optimize the weights and biases of the LSTM in a partitioned manner.
Experimental evaluations on air quality time series data from Italy and southeast Spain demonstrate that our method substantially improves the ability generalization of conventional LSTMs.
arXiv Detail & Related papers (2023-12-29T08:42:10Z) - Temporal Context Aggregation Network for Temporal Action Proposal
Refinement [93.03730692520999]
Temporal action proposal generation is a challenging yet important task in the video understanding field.
Current methods still suffer from inaccurate temporal boundaries and inferior confidence used for retrieval.
We propose TCANet to generate high-quality action proposals through "local and global" temporal context aggregation.
arXiv Detail & Related papers (2021-03-24T12:34:49Z) - Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures.
We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view.
We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z) - TTPP: Temporal Transformer with Progressive Prediction for Efficient
Action Anticipation [46.28067541184604]
Video action anticipation aims to predict future action categories from observed frames.
Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states.
This paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction framework.
arXiv Detail & Related papers (2020-03-07T07:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.