Related papers: Learning Quality-aware Representation for Multi-person Pose Regression

Learning Quality-aware Representation for Multi-person Pose Regression

URL: http://arxiv.org/abs/2201.01087v1
Date: Tue, 4 Jan 2022 11:10:28 GMT
Title: Learning Quality-aware Representation for Multi-person Pose Regression
Authors: Yabo Xiao, Dongdong Yu, Xiaojuan Wang, Lei Jin, Guoli Wang, Qian Zhang
Abstract summary: We learn the pose regression quality-aware representation. Our method achieves the state-of-the-art result of 71.7 AP on MS COCO test-dev set.
Score: 8.83185608408674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Off-the-shelf single-stage multi-person pose regression methods generally leverage the instance score (i.e., confidence of the instance localization) to indicate the pose quality for selecting the pose candidates. We consider that there are two gaps involved in existing paradigm:~1) The instance score is not well interrelated with the pose regression quality.~2) The instance feature representation, which is used for predicting the instance score, does not explicitly encode the structural pose information to predict the reasonable score that represents pose regression quality. To address the aforementioned issues, we propose to learn the pose regression quality-aware representation. Concretely, for the first gap, instead of using the previous instance confidence label (e.g., discrete {1,0} or Gaussian representation) to denote the position and confidence for person instance, we firstly introduce the Consistent Instance Representation (CIR) that unifies the pose regression quality score of instance and the confidence of background into a pixel-wise score map to calibrates the inconsistency between instance score and pose regression quality. To fill the second gap, we further present the Query Encoding Module (QEM) including the Keypoint Query Encoding (KQE) to encode the positional and semantic information for each keypoint and the Pose Query Encoding (PQE) which explicitly encodes the predicted structural pose information to better fit the Consistent Instance Representation (CIR). By using the proposed components, we significantly alleviate the above gaps. Our method outperforms previous single-stage regression-based even bottom-up methods and achieves the state-of-the-art result of 71.7 AP on MS COCO test-dev set.

Related papers

UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation. Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation. We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z)
Regression-free Blind Image Quality Assessment with Content-Distortion Consistency [42.683300312253884]
We propose a regression-free framework for image quality evaluation. It is based upon retrieving locally similar instances by incorporating semantic and distortion feature spaces. The proposed method achieves competitive, even superior performance compared to state-of-the-art regression-based methods.
arXiv Detail & Related papers (2023-07-18T14:19:28Z)
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding [34.078590816368056]
We study the problem of visual grounding by considering both phrase extraction and grounding (PEG) PEG requires a model to extract phrases from text and locate objects from images simultaneously. We propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text.
arXiv Detail & Related papers (2022-11-28T16:30:46Z)
Action Quality Assessment with Temporal Parsing Transformer [84.1272079121699]
Action Quality Assessment (AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences. We propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations. Our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.
arXiv Detail & Related papers (2022-07-19T13:29:05Z)
Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images. Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z)
Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. We present a simple yet effective approach, named disentangled keypoint regression (DEKR) We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z)
Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation [85.96410825961966]
We argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation.
arXiv Detail & Related papers (2020-07-06T15:59:56Z)
Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance. First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression. Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance. Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.