Tensor Representations for Action Recognition
- URL: http://arxiv.org/abs/2012.14371v2
- Date: Tue, 29 Dec 2020 21:44:32 GMT
- Title: Tensor Representations for Action Recognition
- Authors: Piotr Koniusz and Lei Wang and Anoop Cherian
- Abstract summary: Human actions in sequences are characterized by the complex interplay between spatial features and their temporal dynamics.
We propose novel tensor representations for capturing higher-order relationships between visual features for the task of action recognition.
We use higher-order tensors and so-called Eigenvalue Power Normalization (NEP) which have been long speculated to perform spectral detection of higher-order occurrences.
- Score: 54.710267354274194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human actions in video sequences are characterized by the complex interplay
between spatial features and their temporal dynamics. In this paper, we propose
novel tensor representations for compactly capturing such higher-order
relationships between visual features for the task of action recognition. We
propose two tensor-based feature representations, viz. (i) sequence
compatibility kernel (SCK) and (ii) dynamics compatibility kernel (DCK); the
former building on the spatio-temporal correlations between features, while the
latter explicitly modeling the action dynamics of a sequence. We also explore
generalization of SCK, coined SCK(+), that operates on subsequences to capture
the local-global interplay of correlations, which can incorporate multi-modal
inputs e.g., skeleton 3D body-joints and per-frame classifier scores obtained
from deep learning models trained on videos. We introduce linearization of
these kernels that lead to compact and fast descriptors. We provide experiments
on (i) 3D skeleton action sequences, (ii) fine-grained video sequences, and
(iii) standard non-fine-grained videos. As our final representations are
tensors that capture higher-order relationships of features, they relate to
co-occurrences for robust fine-grained recognition. We use higher-order tensors
and so-called Eigenvalue Power Normalization (EPN) which have been long
speculated to perform spectral detection of higher-order occurrences, thus
detecting fine-grained relationships of features rather than merely count
features in action sequences. We prove that a tensor of order r, built from Z*
dimensional features, coupled with EPN indeed detects if at least one
higher-order occurrence is `projected' into one of its binom(Z*,r) subspaces of
dim. r represented by the tensor, thus forming a Tensor Power Normalization
metric endowed with binom(Z*,r) such `detectors'.
Related papers
- S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR)
Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection.
In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z) - Learning Sequence Representations by Non-local Recurrent Neural Memory [61.65105481899744]
We propose a Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning.
Our model is able to capture long-range dependencies and latent high-level features can be distilled by our model.
Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
arXiv Detail & Related papers (2022-07-20T07:26:15Z) - 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive [28.720272938306692]
We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2021-12-23T16:09:23Z) - High-order Tensor Pooling with Attention for Action Recognition [39.22510412349891]
We capture high-order statistics of feature vectors formed by a neural network.
We propose end-to-end second- and higher-order pooling to form a tensor descriptor.
arXiv Detail & Related papers (2021-10-11T12:32:56Z) - Sequential convolutional network for behavioral pattern extraction in
gait recognition [0.7874708385247353]
We propose a sequential convolutional network (SCN) to learn the walking pattern of individuals.
In SCN, behavioral information extractors (BIE) are constructed to comprehend intermediate feature maps in time series.
A multi-frame aggregator in SCN performs feature integration on a sequence whose length is uncertain, via a mobile 3D convolutional layer.
arXiv Detail & Related papers (2021-04-23T08:44:10Z) - Out-of-time-order correlations and the fine structure of eigenstate
thermalisation [58.720142291102135]
Out-of-time-orderors (OTOCs) have become established as a tool to characterise quantum information dynamics and thermalisation.
We show explicitly that the OTOC is indeed a precise tool to explore the fine details of the Eigenstate Thermalisation Hypothesis (ETH)
We provide an estimation of the finite-size scaling of $omega_textrmGOE$ for the general class of observables composed of sums of local operators in the infinite-temperature regime.
arXiv Detail & Related papers (2021-03-01T17:51:46Z) - Analysis of Latent-Space Motion for Collaborative Intelligence [26.24508656138528]
We show that the motion present in each channel of a feature tensor is approximately equal to the scaled version of the input motion.
Results will be useful in collaborative intelligence applications.
arXiv Detail & Related papers (2021-02-08T06:22:07Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.