Learning Delicate Local Representations for Multi-Person Pose Estimation
- URL: http://arxiv.org/abs/2003.04030v3
- Date: Wed, 15 Jul 2020 13:09:57 GMT
- Title: Learning Delicate Local Representations for Multi-Person Pose Estimation
- Authors: Yuanhao Cai, Zhicheng Wang, Zhengxiong Luo, Binyi Yin, Angang Du,
Haoqian Wang, Xiangyu Zhang, Xinyu Zhou, Erjin Zhou, Jian Sun
- Abstract summary: We propose a novel method called Residual Steps Network (RSN)
RSN aggregates features with the same spatial size (Intra-level features) efficiently to obtain delicate local representations.
Our approach won the 1st place of COCO Keypoint Challenge 2019.
- Score: 77.53144055780423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel method called Residual Steps Network (RSN).
RSN aggregates features with the same spatial size (Intra-level features)
efficiently to obtain delicate local representations, which retain rich
low-level spatial information and result in precise keypoint localization.
Additionally, we observe the output features contribute differently to final
performance. To tackle this problem, we propose an efficient attention
mechanism - Pose Refine Machine (PRM) to make a trade-off between local and
global representations in output features and further refine the keypoint
locations. Our approach won the 1st place of COCO Keypoint Challenge 2019 and
achieves state-of-the-art results on both COCO and MPII benchmarks, without
using extra training data and pretrained model. Our single model achieves 78.6
on COCO test-dev, 93.0 on MPII test dataset. Ensembled models achieve 79.2 on
COCO test-dev, 77.1 on COCO test-challenge dataset. The source code is publicly
available for further research at https://github.com/caiyuanhao1998/RSN/
Related papers
- The object detection model uses combined extraction with KNN and RF classification [0.0]
This study contributes to the field of object detection with a new approach combining GLCM and LBP as feature vectors as well as VE for classification.
System testing used a dataset of 4,437 2D images, the results for KNN accuracy were 92.7% and F1-score 92.5%, while RF performance was lower.
arXiv Detail & Related papers (2024-05-09T05:21:42Z) - PoCo: Point Context Cluster for RGBD Indoor Place Recognition [47.12179061883084]
We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database.
We propose a new network architecture, which generalizes the recent Context of Clusters (CoCs) to extract global descriptors directly from the noisy point clouds through end-to-end learning.
arXiv Detail & Related papers (2024-04-03T17:38:15Z) - VoxelKP: A Voxel-based Network Architecture for Human Keypoint
Estimation in LiDAR Data [53.638818890966036]
textitVoxelKP is a novel fully sparse network architecture tailored for human keypoint estimation in LiDAR data.
We introduce sparse box-attention to focus on learning spatial correlations between keypoints within each human instance.
We incorporate a spatial encoding to leverage absolute 3D coordinates when projecting 3D voxels to a 2D grid encoding a bird's eye view.
arXiv Detail & Related papers (2023-12-11T23:50:14Z) - AANet: Aggregation and Alignment Network with Semi-hard Positive Sample
Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots.
We present a unified network capable of extracting global features for retrieving candidates via an aggregation module.
We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z) - Revisiting Color-Event based Tracking: A Unified Network, Dataset, and
Metric [53.88188265943762]
We propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously.
Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance.
arXiv Detail & Related papers (2022-11-20T16:01:31Z) - Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance
Segmentation Challenge Track [87.90450014797287]
MegDetV2 works in a two-pass fashion, first to detect instances then to obtain segmentation.
On the COCO-2019 detection/instance-segmentation test-dev dataset, our system achieves 61.0/53.1 mAP, which surpassed our 2018 winning results by 5.0/4.2 respectively.
arXiv Detail & Related papers (2020-10-06T04:49:37Z) - 2nd Place Scheme on Action Recognition Track of ECCV 2020 VIPriors
Challenges: An Efficient Optical Flow Stream Guided Framework [57.847010327319964]
We propose a data-efficient framework that can train the model from scratch on small datasets.
Specifically, by introducing a 3D central difference convolution operation, we proposed a novel C3D neural network-based two-stream framework.
It is proved that our method can achieve a promising result even without a pre-trained model on large scale datasets.
arXiv Detail & Related papers (2020-08-10T09:50:28Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - Joint COCO and Mapillary Workshop at ICCV 2019 Keypoint Detection
Challenge Track Technical Report: Distribution-Aware Coordinate
Representation for Human Pose Estimation [36.73217430761146]
We focus on the coordinate representation in human pose estimation.
We propose a principled distribution-aware decoding method.
Taking them together, we formulate a novel Distribution-Aware coordinate Representation for Keypoint (DARK) method.
arXiv Detail & Related papers (2020-03-13T10:22:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.