Related papers: YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

URL: http://arxiv.org/abs/2204.06806v1
Date: Thu, 14 Apr 2022 08:02:40 GMT
Title: YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss
Authors: Debapriya Maji, Soyeb Nagori, Manu Mathew, Deepak Poddar
Abstract summary: YOLO-pose is a novel heatmap-free approach for joint detection and 2D multi-person pose estimation. Our framework allows us to train the model end-to-end and optimize the Object Keypoint Similarity (OKS) metric itself. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50)
Score: 1.3381749415517017
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox

Related papers

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection [63.36722419180875]
We provide an efficient and performant object detector, termed YOLO-MS. We train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets. Our work can also serve as a plug-and-play module for other YOLO models.
arXiv Detail & Related papers (2023-08-10T10:12:27Z)
Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity [46.10812760258666]
Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms. We propose a novel pipeline called bottom-up conditioned top-down pose estimation. We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks.
arXiv Detail & Related papers (2023-06-13T16:14:40Z)
SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation [24.973118696495977]
This paper presents a novel end-to-end framework withExplicit box Detection for multi-person Pose estimation, called ED-Pose. It unifies the contextual learning between human-level (global) and keypoint-level (local) information. For the first time, as a fully end-to-end framework with a L1 regression loss, ED-Pose surpasses heatmap-based Top-down methods under the same backbone.
arXiv Detail & Related papers (2023-02-03T08:18:34Z)
Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons. Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA) For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z)
YOLOV: Making Still Image Object Detectors Great at Video Object Detection [23.039968987772543]
Video object detection (VID) is challenging because of the high variation of object appearance and the diverse deterioration in some frames. This work proposes a simple yet effective strategy to address the concerns, which spends marginal overheads with significant gains in accuracy. Our YOLOX-based model can achieve promising performance (e.g., 87.5% AP50 at over 30 FPS on the ImageNet VID dataset on a single 2080Ti GPU)
arXiv Detail & Related papers (2022-08-20T14:12:06Z)
Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation [79.78017059539526]
We propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework. In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing. Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation.
arXiv Detail & Related papers (2021-11-16T15:36:44Z)
Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images. Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches. We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z)
SMPR: Single-Stage Multi-Person Pose Regression [41.096103136666834]
A novel single-stage multi-person pose regression, termed SMPR, is presented. It follows the paradigm of dense prediction and predicts instance-aware keypoints from every location. We show that our method not only outperforms existing single-stage methods and but also be competitive with the latest bottom-up methods.
arXiv Detail & Related papers (2020-06-28T11:26:38Z)
Frustratingly Simple Few-Shot Object Detection [98.42824677627581]
We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 220 points on current benchmarks.
arXiv Detail & Related papers (2020-03-16T00:29:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.