3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive
- URL: http://arxiv.org/abs/2112.12668v1
- Date: Thu, 23 Dec 2021 16:09:23 GMT
- Title: 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so
Na\"ive
- Authors: Lei Wang, Jun Liu, Piotr Koniusz
- Abstract summary: We propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
- Score: 28.720272938306692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a Few-shot Learning pipeline for 3D skeleton-based
action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt (JEANIE).
To factor out misalignment between query and support sequences of 3D body
joints, we propose an advanced variant of Dynamic Time Warping which jointly
models each smooth path between the query and support frames to achieve
simultaneously the best alignment in the temporal and simulated camera
viewpoint spaces for end-to-end learning under the limited few-shot training
data. Sequences are encoded with a temporal block encoder based on Simple
Spectral Graph Convolution, a lightweight linear Graph Neural Network backbone
(we also include a setting with a transformer). Finally, we propose a
similarity-based loss which encourages the alignment of sequences of the same
class while preventing the alignment of unrelated sequences. We demonstrate
state-of-the-art results on NTU-60, NTU-120, Kinetics-skeleton and UWA3D
Multiview Activity II.
Related papers
- Meet JEANIE: a Similarity Measure for 3D Skeleton Sequences via Temporal-Viewpoint Alignment [44.22075586147116]
Video sequences exhibit significant variations (undesired effects) of speed of actions, temporal locations, and subjects' poses.
We propose Joint tEmporal and cAmera viewpoiNt alIgnmEnt (JEANIE) for sequence pairs.
arXiv Detail & Related papers (2024-02-07T05:47:31Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Temporal-Viewpoint Transportation Plan for Skeletal Few-shot Action
Recognition [38.27785891922479]
Few-shot learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt.
arXiv Detail & Related papers (2022-10-30T11:46:38Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - Leveraging Third-Order Features in Skeleton-Based Action Recognition [26.349722372701482]
Skeleton sequences are light-weight and compact, and thus ideal candidates for action recognition on edge devices.
Recent action recognition methods extract features from 3D joint coordinates as spatial-temporal cues, using these representations in a graph neural network for feature fusion.
We propose fusing third-order features in the form of angles into modern architectures, to robustly capture the relationships between joints and body parts.
arXiv Detail & Related papers (2021-05-04T15:23:29Z) - Tensor Representations for Action Recognition [54.710267354274194]
Human actions in sequences are characterized by the complex interplay between spatial features and their temporal dynamics.
We propose novel tensor representations for capturing higher-order relationships between visual features for the task of action recognition.
We use higher-order tensors and so-called Eigenvalue Power Normalization (NEP) which have been long speculated to perform spectral detection of higher-order occurrences.
arXiv Detail & Related papers (2020-12-28T17:27:18Z) - MotioNet: 3D Human Motion Reconstruction from Monocular Video with
Skeleton Consistency [72.82534577726334]
We introduce MotioNet, a deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video.
Our method is the first data-driven approach that directly outputs a kinematic skeleton, which is a complete, commonly used, motion representation.
arXiv Detail & Related papers (2020-06-22T08:50:09Z) - Skeleton Based Action Recognition using a Stacked Denoising Autoencoder
with Constraints of Privileged Information [5.67220249825603]
We propose a new method to study the skeletal representation in a view of skeleton reconstruction.
Based on the concept of learning under privileged information, we integrate action categories and temporal coordinates into a stacked denoising autoencoder.
In order to mitigate the variation resulting from temporary misalignment, a new method of temporal registration is proposed.
arXiv Detail & Related papers (2020-03-12T09:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.