Effective Action Recognition with Embedded Key Point Shifts
- URL: http://arxiv.org/abs/2008.11378v1
- Date: Wed, 26 Aug 2020 05:19:04 GMT
- Title: Effective Action Recognition with Embedded Key Point Shifts
- Authors: Haozhi Cao, Yuecong Xu, Jianfei Yang, Kezhi Mao, Jianxiong Yin and
Simon See
- Abstract summary: We propose a novel temporal feature extraction module, named Key Point Shifts Embedding Module ($KPSEM$)
Key points are adaptively extracted as feature points with maximum feature values at split regions, while key point shifts are the spatial displacements of corresponding key points.
Our method achieves competitive performance through embedding key point shifts with trivial computational cost.
- Score: 19.010874017607247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal feature extraction is an essential technique in video-based action
recognition. Key points have been utilized in skeleton-based action recognition
methods but they require costly key point annotation. In this paper, we propose
a novel temporal feature extraction module, named Key Point Shifts Embedding
Module ($KPSEM$), to adaptively extract channel-wise key point shifts across
video frames without key point annotation for temporal feature extraction. Key
points are adaptively extracted as feature points with maximum feature values
at split regions, while key point shifts are the spatial displacements of
corresponding key points. The key point shifts are encoded as the overall
temporal features via linear embedding layers in a multi-set manner. Our method
achieves competitive performance through embedding key point shifts with
trivial computational cost, achieving the state-of-the-art performance of
82.05% on Mini-Kinetics and competitive performance on UCF101,
Something-Something-v1, and HMDB51 datasets.
Related papers
- GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring [9.322937309882022]
Keypoints come with a score permitting to rank them according to their quality.
While learned keypoints often exhibit better properties than handcrafted ones, their scores are not easily interpretable.
We propose a framework that can refine, and at the same time characterize with an interpretable score, the keypoints extracted by any method.
arXiv Detail & Related papers (2024-08-30T09:39:59Z) - Meta-Point Learning and Refining for Category-Agnostic Pose Estimation [46.98479393474727]
Category-agnostic pose estimation (CAPE) aims to predict keypoints for arbitrary classes given a few support images annotated with keypoints.
We propose a novel framework for CAPE based on such potential keypoints (named as meta-points)
arXiv Detail & Related papers (2024-03-20T14:54:33Z) - Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z) - Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural
Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task.
Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images.
We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z) - Action Keypoint Network for Efficient Video Recognition [63.48422805355741]
This paper proposes to integrate temporal and spatial selection into an Action Keypoint Network (AK-Net)
AK-Net selects some informative points scattered in arbitrary-shaped regions as a set of action keypoints and then transforms the video recognition into point cloud classification.
Experimental results show that AK-Net can consistently improve the efficiency and performance of baseline methods on several video recognition benchmarks.
arXiv Detail & Related papers (2022-01-17T09:35:34Z) - Accurate Grid Keypoint Learning for Efficient Video Prediction [87.71109421608232]
Keypoint-based video prediction methods can consume substantial computing resources in training and deployment.
In this paper, we design a new grid keypoint learning framework, aiming at a robust and explainable intermediate keypoint representation for long-term efficient video prediction.
Our method outperforms the state-ofthe-art video prediction methods while saves 98% more than computing resources.
arXiv Detail & Related papers (2021-07-28T05:04:30Z) - Unsupervised Object Keypoint Learning using Local Spatial Predictability [10.862430265350804]
We propose PermaKey, a novel approach to representation learning based on object keypoints.
We demonstrate the efficacy of PermaKey on Atari where it learns keypoints corresponding to the most salient object parts and is robust to certain visual distractors.
arXiv Detail & Related papers (2020-11-25T18:27:05Z) - Keypoint Autoencoders: Learning Interest Points of Semantics [4.551313396927381]
We propose Keypoint Autoencoder, an unsupervised learning method for detecting keypoints.
We encourage selecting sparse semantic keypoints by enforcing the reconstruction from keypoints to the original point cloud.
A downstream task of classifying shape with sparse keypoints is conducted to demonstrate the distinctiveness of our selected keypoints.
arXiv Detail & Related papers (2020-08-11T03:43:18Z) - FAIRS -- Soft Focus Generator and Attention for Robust Object
Segmentation from Extreme Points [70.65563691392987]
We present a new approach to generate object segmentation from user inputs in the form of extreme points and corrective clicks.
We demonstrate our method's ability to generate high-quality training data as well as its scalability in incorporating extreme points, guiding clicks, and corrective clicks in a principled manner.
arXiv Detail & Related papers (2020-04-04T22:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.