YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object
Keypoint Similarity Loss
- URL: http://arxiv.org/abs/2204.06806v1
- Date: Thu, 14 Apr 2022 08:02:40 GMT
- Title: YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object
Keypoint Similarity Loss
- Authors: Debapriya Maji, Soyeb Nagori, Manu Mathew, Deepak Poddar
- Abstract summary: YOLO-pose is a novel heatmap-free approach for joint detection and 2D multi-person pose estimation.
Our framework allows us to train the model end-to-end and optimize the Object Keypoint Similarity (OKS) metric itself.
YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50)
- Score: 1.3381749415517017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce YOLO-pose, a novel heatmap-free approach for joint detection,
and 2D multi-person pose estimation in an image based on the popular YOLO
object detection framework. Existing heatmap based two-stage approaches are
sub-optimal as they are not end-to-end trainable and training relies on a
surrogate L1 loss that is not equivalent to maximizing the evaluation metric,
i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the
model end-to-end and optimize the OKS metric itself. The proposed model learns
to jointly detect bounding boxes for multiple persons and their corresponding
2D poses in a single forward pass and thus bringing in the best of both
top-down and bottom-up approaches. Proposed approach doesn't require the
postprocessing of bottom-up approaches to group detected keypoints into a
skeleton as each bounding box has an associated pose, resulting in an inherent
grouping of the keypoints. Unlike top-down approaches, multiple forward passes
are done away with since all persons are localized along with their pose in a
single inference. YOLO-pose achieves new state-of-the-art results on COCO
validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing
bottom-up approaches in a single forward pass without flip test, multi-scale
testing, or any other test time augmentation. All experiments and results
reported in this paper are without any test time augmentation, unlike
traditional approaches that use flip-test and multi-scale testing to boost
performance. Our training codes will be made publicly available at
https://github.com/TexasInstruments/edgeai-yolov5 and
https://github.com/TexasInstruments/edgeai-yolox
Related papers
- Rethinking pose estimation in crowds: overcoming the detection
information-bottleneck and ambiguity [46.10812760258666]
Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms.
We propose a novel pipeline called bottom-up conditioned top-down pose estimation.
We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks.
arXiv Detail & Related papers (2023-06-13T16:14:40Z) - SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework.
Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z) - PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator.
We create a new training pipeline for object to image matching based on a three-view system.
To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z) - Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation [24.973118696495977]
This paper presents a novel end-to-end framework withExplicit box Detection for multi-person Pose estimation, called ED-Pose.
It unifies the contextual learning between human-level (global) and keypoint-level (local) information.
For the first time, as a fully end-to-end framework with a L1 regression loss, ED-Pose surpasses heatmap-based Top-down methods under the same backbone.
arXiv Detail & Related papers (2023-02-03T08:18:34Z) - Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale
Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons.
Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA)
For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z) - YOLOV: Making Still Image Object Detectors Great at Video Object
Detection [23.039968987772543]
Video object detection (VID) is challenging because of the high variation of object appearance and the diverse deterioration in some frames.
This work proposes a simple yet effective strategy to address the concerns, which spends marginal overheads with significant gains in accuracy.
Our YOLOX-based model can achieve promising performance (e.g., 87.5% AP50 at over 30 FPS on the ImageNet VID dataset on a single 2080Ti GPU)
arXiv Detail & Related papers (2022-08-20T14:12:06Z) - Rethinking Keypoint Representations: Modeling Keypoints and Poses as
Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework.
In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing.
Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - SMPR: Single-Stage Multi-Person Pose Regression [41.096103136666834]
A novel single-stage multi-person pose regression, termed SMPR, is presented.
It follows the paradigm of dense prediction and predicts instance-aware keypoints from every location.
We show that our method not only outperforms existing single-stage methods and but also be competitive with the latest bottom-up methods.
arXiv Detail & Related papers (2020-06-28T11:26:38Z) - Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task.
Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.