Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D
Human Pose Estimation
- URL: http://arxiv.org/abs/2012.07101v1
- Date: Sun, 13 Dec 2020 17:04:29 GMT
- Title: Learning Heatmap-Style Jigsaw Puzzles Provides Good Pretraining for 2D
Human Pose Estimation
- Authors: Kun Zhang, Rui Wu, Ping Yao, Kai Deng, Ding Li, Renbiao Liu,
Chuanguang Yang, Ge Chen, Min Du, Tianyao Zheng
- Abstract summary: We introduce a self-supervised method for pretraining 2D pose estimation networks.
Specifically, we propose Heatmap-Style Jigsaw Puzzles (HSJP) problem as our pretext-task.
We only use images of person instances in MS-COCO, rather than introducing extra and much larger ImageNet dataset.
With two popular and strong 2D human pose estimators, HRNet and SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev datasets.
- Score: 19.389708889730834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The target of 2D human pose estimation is to locate the keypoints of body
parts from input 2D images. State-of-the-art methods for pose estimation
usually construct pixel-wise heatmaps from keypoints as labels for learning
convolution neural networks, which are usually initialized randomly or using
classification models on ImageNet as their backbones. We note that 2D pose
estimation task is highly dependent on the contextual relationship between
image patches, thus we introduce a self-supervised method for pretraining 2D
pose estimation networks. Specifically, we propose Heatmap-Style Jigsaw Puzzles
(HSJP) problem as our pretext-task, whose target is to learn the location of
each patch from an image composed of shuffled patches. During our pretraining
process, we only use images of person instances in MS-COCO, rather than
introducing extra and much larger ImageNet dataset. A heatmap-style label for
patch location is designed and our learning process is in a non-contrastive
way. The weights learned by HSJP pretext task are utilised as backbones of 2D
human pose estimator, which are then finetuned on MS-COCO human keypoints
dataset. With two popular and strong 2D human pose estimators, HRNet and
SimpleBaseline, we evaluate mAP score on both MS-COCO validation and test-dev
datasets. Our experiments show that downstream pose estimators with our
self-supervised pretraining obtain much better performance than those trained
from scratch, and are comparable to those using ImageNet classification models
as their initial backbones.
Related papers
- Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose
Estimation [10.374944534302234]
"lifting from 2D pose" method has been the dominant approach to 3D Human Pose Estimation (3DHPE)
Rich semantic and texture information in images can contribute to a more accurate "lifting" procedure.
In this paper, we give new insight into the cause of poor generalization problems and the effectiveness of image features.
arXiv Detail & Related papers (2023-12-25T07:50:58Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Semi-Supervised 2D Human Pose Estimation Driven by Position
Inconsistency Pseudo Label Correction Module [74.80776648785897]
The previous method ignored two problems: (i) When conducting interactive training between large model and lightweight model, the pseudo label of lightweight model will be used to guide large models.
We propose a semi-supervised 2D human pose estimation framework driven by a position inconsistency pseudo label correction module (SSPCM)
To further improve the performance of the student model, we use the semi-supervised Cut-Occlude based on pseudo keypoint perception to generate more hard and effective samples.
arXiv Detail & Related papers (2023-03-08T02:57:05Z) - KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D
Correspondences [77.56222946832237]
We present a novel framework to detect the densepose of multiple people in an image.
The proposed method, which we refer to Knowledge Transfer Network (KTN), tackles two main problems.
It simultaneously maintains feature resolution and suppresses background pixels, and this strategy results in substantial increase in accuracy.
arXiv Detail & Related papers (2022-06-21T03:11:37Z) - OSOP: A Multi-Stage One Shot Object Pose Estimation Framework [35.89334617258322]
We present a novel one-shot method for object detection and 6 DoF pose estimation, that does not require training on target objects.
At test time, it takes as input a target image and a textured 3D query model.
We evaluate the method on LineMOD, Occlusion, Homebrewed, YCB-V and TLESS datasets.
arXiv Detail & Related papers (2022-03-29T13:12:00Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - 6D Object Pose Estimation using Keypoints and Part Affinity Fields [24.126513851779936]
The task of 6D object pose estimation from RGB images is an important requirement for autonomous service robots to be able to interact with the real world.
We present a two-step pipeline for estimating the 6 DoF translation and orientation of known objects.
arXiv Detail & Related papers (2021-07-05T14:41:19Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.