Accurate Grid Keypoint Learning for Efficient Video Prediction
- URL: http://arxiv.org/abs/2107.13170v1
- Date: Wed, 28 Jul 2021 05:04:30 GMT
- Title: Accurate Grid Keypoint Learning for Efficient Video Prediction
- Authors: Xiaojie Gao, Yueming Jin, Qi Dou, Chi-Wing Fu, and Pheng-Ann Heng
- Abstract summary: Keypoint-based video prediction methods can consume substantial computing resources in training and deployment.
In this paper, we design a new grid keypoint learning framework, aiming at a robust and explainable intermediate keypoint representation for long-term efficient video prediction.
Our method outperforms the state-ofthe-art video prediction methods while saves 98% more than computing resources.
- Score: 87.71109421608232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video prediction methods generally consume substantial computing resources in
training and deployment, among which keypoint-based approaches show promising
improvement in efficiency by simplifying dense image prediction to light
keypoint prediction. However, keypoint locations are often modeled only as
continuous coordinates, so noise from semantically insignificant deviations in
videos easily disrupt learning stability, leading to inaccurate keypoint
modeling. In this paper, we design a new grid keypoint learning framework,
aiming at a robust and explainable intermediate keypoint representation for
long-term efficient video prediction. We have two major technical
contributions. First, we detect keypoints by jumping among candidate locations
in our raised grid space and formulate a condensation loss to encourage
meaningful keypoints with strong representative capability. Second, we
introduce a 2D binary map to represent the detected grid keypoints and then
suggest propagating keypoint locations with stochasticity by selecting entries
in the discrete grid space, thus preserving the spatial structure of keypoints
in the longterm horizon for better future frame generation. Extensive
experiments verify that our method outperforms the state-ofthe-art stochastic
video prediction methods while saves more than 98% of computing resources. We
also demonstrate our method on a robotic-assisted surgery dataset with
promising results. Our code is available at
https://github.com/xjgaocs/Grid-Keypoint-Learning.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios [2.786599193929693]
We propose a new 6D pose estimation network with symmetric-aware keypoint prediction and self-training domain adaptation (SD-Net)
At the keypoint prediction stage, we designe a robust 3D keypoints selection strategy to locate 3D keypoints even in highly occluded scenes.
At the domain adaptation stage, we propose the self-training framework using a student-teacher training scheme.
On public Sil'eane dataset, SD-Net achieves state-of-the-art results, obtaining an average precision of 96%.
arXiv Detail & Related papers (2024-03-14T12:08:44Z) - Unsupervised Keypoints from Pretrained Diffusion Models [31.147785019795347]
We leverage the emergent knowledge within text-to-image diffusion models, towards more robust unsupervised keypoints.
Our core idea is to find text embeddings that would cause the generative model to consistently attend to compact regions in images.
We validate our performance on multiple datasets: the CelebA, CUB-200-2011, Tai-Chi-HD, DeepFashion, and Human3.6m datasets.
arXiv Detail & Related papers (2023-11-29T19:43:38Z) - KGNv2: Separating Scale and Pose Prediction for Keypoint-based 6-DoF
Grasp Synthesis on RGB-D input [16.897624250286487]
Keypoint-based grasp detector from image input has demonstrated promising results.
We devise a new grasp generation network that reduces the dependency on precise keypoint estimation.
arXiv Detail & Related papers (2023-03-09T23:11:52Z) - Long-Lived Accurate Keypoints in Event Streams [28.892653505044425]
We present a novel end-to-end approach to keypoint detection and tracking in an event stream.
We show it results in keypoint tracks that are three times longer and nearly twice as accurate as the best previous state-of-the-art methods.
arXiv Detail & Related papers (2022-09-21T14:25:31Z) - Action Keypoint Network for Efficient Video Recognition [63.48422805355741]
This paper proposes to integrate temporal and spatial selection into an Action Keypoint Network (AK-Net)
AK-Net selects some informative points scattered in arbitrary-shaped regions as a set of action keypoints and then transforms the video recognition into point cloud classification.
Experimental results show that AK-Net can consistently improve the efficiency and performance of baseline methods on several video recognition benchmarks.
arXiv Detail & Related papers (2022-01-17T09:35:34Z) - Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras.
Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement.
In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z) - Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework.
We present a simple yet effective approach, named disentangled keypoint regression (DEKR)
We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z) - Keypoint Autoencoders: Learning Interest Points of Semantics [4.551313396927381]
We propose Keypoint Autoencoder, an unsupervised learning method for detecting keypoints.
We encourage selecting sparse semantic keypoints by enforcing the reconstruction from keypoints to the original point cloud.
A downstream task of classifying shape with sparse keypoints is conducted to demonstrate the distinctiveness of our selected keypoints.
arXiv Detail & Related papers (2020-08-11T03:43:18Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.