Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action
Recognition
- URL: http://arxiv.org/abs/2209.01425v1
- Date: Sat, 3 Sep 2022 13:59:49 GMT
- Title: Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action
Recognition
- Authors: Tianjiao Li, Lin Geng Foo, Qiuhong Ke, Hossein Rahmani, Anran Wang,
Jinghua Wang, Jun Liu
- Abstract summary: We derive inspiration from the human visual system which contains specialized regions that are dedicated towards handling specific tasks.
We design a novel Dynamic Dynamic Spatio-Temporal subset (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar.
We design an UpstreamDownstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module.
- Score: 19.562218963941227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of fine-grained action recognition is to successfully discriminate
between action categories with subtle differences. To tackle this, we derive
inspiration from the human visual system which contains specialized regions in
the brain that are dedicated towards handling specific tasks. We design a novel
Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of
specialized neurons that are only activated for a subset of samples that are
highly similar. During training, the loss forces the specialized neurons to
learn discriminative fine-grained differences to distinguish between these
similar samples, improving fine-grained recognition. Moreover, a
spatio-temporal specialization method further optimizes the architectures of
the specialized neurons to capture either more spatial or temporal fine-grained
information, to better tackle the large range of spatio-temporal variations in
the videos. Lastly, we design an Upstream-Downstream Learning algorithm to
optimize our model's dynamic decisions during training, improving the
performance of our DSTS module. We obtain state-of-the-art performance on two
widely-used fine-grained action recognition datasets.
Related papers
- USDRL: Unified Skeleton-Based Dense Representation Learning with Multi-Grained Feature Decorrelation [24.90512145836643]
We introduce a Unified Skeleton-based Dense Representation Learning framework based on feature decorrelation.
We show that our approach significantly outperforms the current state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2024-12-12T12:20:27Z) - Precise Facial Landmark Detection by Dynamic Semantic Aggregation Transformer [29.484887366344363]
Deep neural network methods have played a dominant role in face alignment field.
We propose a Dynamic Semantic-Aggregation Transformer (DSAT) for more discriminative and representative feature learning.
Our proposed DSAT outperforms state-of-the-art models in the literature.
arXiv Detail & Related papers (2024-12-01T09:20:32Z) - Spatio-Temporal Branching for Motion Prediction using Motion Increments [55.68088298632865]
Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications.
Traditional methods rely on hand-crafted features and machine learning techniques.
We propose a noveltemporal-temporal branching network using incremental information for HMP.
arXiv Detail & Related papers (2023-08-02T12:04:28Z) - TS-MoCo: Time-Series Momentum Contrast for Self-Supervised Physiological
Representation Learning [8.129782272731397]
We propose a novel encoding framework that relies on self-supervised learning with momentum contrast to learn representations from various physiological domains without needing labels.
We show that our self-supervised learning approach can indeed learn discriminative features which can be exploited in downstream classification tasks.
arXiv Detail & Related papers (2023-06-10T21:17:42Z) - Learning low-dimensional dynamics from whole-brain data improves task
capture [2.82277518679026]
We introduce a novel approach to learning low-dimensional approximations of neural dynamics by using a sequential variational autoencoder (SVAE)
Our method finds smooth dynamics that can predict cognitive processes with accuracy higher than classical methods.
We evaluate our approach on various task-fMRI datasets, including motor, working memory, and relational processing tasks.
arXiv Detail & Related papers (2023-05-18T18:43:13Z) - Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - Improving Skeleton-based Action Recognitionwith Robust Spatial and
Temporal Features [6.548580592686076]
We propose a novel mechanism to learn more robust discriminative features in space and time.
We show thataction recognition accuracy can be improved when these robust featuresare learned and used.
arXiv Detail & Related papers (2020-08-01T19:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.