Towards High Performance Human Keypoint Detection
- URL: http://arxiv.org/abs/2002.00537v2
- Date: Sun, 23 May 2021 02:23:25 GMT
- Title: Towards High Performance Human Keypoint Detection
- Authors: Jing Zhang and Zhe Chen and Dacheng Tao
- Abstract summary: We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
- Score: 87.1034745775229
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human keypoint detection from a single image is very challenging due to
occlusion, blur, illumination and scale variance. In this paper, we address
this problem from three aspects by devising an efficient network structure,
proposing three effective training strategies, and exploiting four useful
postprocessing techniques. First, we find that context information plays an
important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer (CCM), which efficiently
integrates spatial and channel context information and progressively refines
them. Then, to maximize CCM's representation capability, we develop a
hard-negative person detection mining strategy and a joint-training strategy by
exploiting abundant unlabeled data. It enables CCM to learn discriminative
features from massive diverse poses. Third, we present several sub-pixel
refinement techniques for postprocessing keypoint predictions to improve
detection accuracy. Extensive experiments on the MS COCO keypoint detection
benchmark demonstrate the superiority of the proposed method over
representative state-of-the-art (SOTA) methods. Our single model achieves
comparable performance with the winner of the 2018 COCO Keypoint Detection
Challenge. The final ensemble model sets a new SOTA on this benchmark.
Related papers
- Independently Keypoint Learning for Small Object Semantic Correspondence [7.3866687886529805]
Keypoint Bounding box-centered Cropping method proposed for small object semantic correspondence.
KBCNet comprises a Cross-Scale Feature Alignment (CSFA) module and an efficient 4D convolutional decoder.
Our method demonstrates a substantial performance improvement of 7.5% on the SPair-71k dataset.
arXiv Detail & Related papers (2024-04-03T12:21:41Z) - Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z) - Revisiting Cephalometric Landmark Detection from the view of Human Pose
Estimation with Lightweight Super-Resolution Head [11.40242574405714]
We develop a benchmark based on the well-established human pose estimation (HPE) known as MMPose.
We introduce an upscaling design within the framework to further enhance performance.
In the MICCAI CLDetection2023 challenge, our method achieves 1st place ranking on three metrics and 3rd place on the remaining one.
arXiv Detail & Related papers (2023-09-29T11:15:39Z) - MDPose: Real-Time Multi-Person Pose Estimation via Mixture Density Model [27.849059115252008]
We propose a novel framework of single-stage instance-aware pose estimation by modeling the joint distribution of human keypoints.
Our MDPose achieves state-of-the-art performance by successfully learning the high-dimensional joint distribution of human keypoints.
arXiv Detail & Related papers (2023-02-17T08:29:33Z) - Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond
Algorithms [31.2529724533643]
This work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms.
An analysis on 31 datasets reveals the distinct impacts of data samples.
We achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model.
arXiv Detail & Related papers (2022-09-21T17:39:53Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Sample and Computation Redistribution for Efficient Face Detection [137.19388513633484]
Training data sampling and computation distribution strategies are the keys to efficient and accurate face detection.
scrfdf34 outperforms the best competitor, TinaFace, by $3.86%$ (AP at hard set) while being more than emph3$times$ faster on GPUs with VGA-resolution images.
arXiv Detail & Related papers (2021-05-10T23:51:14Z) - Hierarchical Deep CNN Feature Set-Based Representation Learning for
Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics.
Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space.
In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z) - Group-Skeleton-Based Human Action Recognition in Complex Events [15.649778891665468]
We propose a novel group-skeleton-based human action recognition method in complex events.
This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons.
Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-11-26T13:19:14Z) - 3DSSD: Point-based 3D Single Stage Object Detector [61.67928229961813]
We present a point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency.
Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well.
arXiv Detail & Related papers (2020-02-24T12:01:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.