QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware
Part-Level Query
- URL: http://arxiv.org/abs/2212.07855v1
- Date: Thu, 15 Dec 2022 14:22:49 GMT
- Title: QueryPose: Sparse Multi-Person Pose Regression via Spatial-Aware
Part-Level Query
- Authors: Yabo Xiao, Kai Su, Xiaojuan Wang, Dongdong Yu, Lei Jin, Mingshu He,
Zehuan Yuan
- Abstract summary: We propose a sparse end-to-end multi-person pose regression framework, termed QueryPose, which can directly predict multi-person keypoint sequences from the input image.
In our framework, each human instance is encoded by several learnable spatial-aware part-level queries.
With the bipartite matching, QueryPose avoids the hand-designed post-processes and surpasses the existing dense end-to-end methods with 73.6 AP on MS mini-val set and 72.7 AP on CrowdPose test set.
- Score: 15.934593709289931
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a sparse end-to-end multi-person pose regression framework, termed
QueryPose, which can directly predict multi-person keypoint sequences from the
input image. The existing end-to-end methods rely on dense representations to
preserve the spatial detail and structure for precise keypoint localization.
However, the dense paradigm introduces complex and redundant post-processes
during inference. In our framework, each human instance is encoded by several
learnable spatial-aware part-level queries associated with an instance-level
query. First, we propose the Spatial Part Embedding Generation Module (SPEGM)
that considers the local spatial attention mechanism to generate several
spatial-sensitive part embeddings, which contain spatial details and structural
information for enhancing the part-level queries. Second, we introduce the
Selective Iteration Module (SIM) to adaptively update the sparse part-level
queries via the generated spatial-sensitive part embeddings stage-by-stage.
Based on the two proposed modules, the part-level queries are able to fully
encode the spatial details and structural information for precise keypoint
regression. With the bipartite matching, QueryPose avoids the hand-designed
post-processes and surpasses the existing dense end-to-end methods with 73.6 AP
on MS COCO mini-val set and 72.7 AP on CrowdPose test set. Code is available at
https://github.com/buptxyb666/QueryPose.
Related papers
- Instance-free Text to Point Cloud Localization with Relative Position Awareness [37.22900045434484]
Text-to-point-cloud cross-modal localization is an emerging vision-language task critical for future robot-human collaboration.
We address two key limitations of existing approaches: 1) their reliance on ground-truth instances as input; and 2) their neglect of the relative positions among potential instances.
Our proposed model follows a two-stage pipeline, including a coarse stage for text-cell retrieval and a fine stage for position estimation.
arXiv Detail & Related papers (2024-04-27T09:46:49Z) - Temporal-aware Hierarchical Mask Classification for Video Semantic
Segmentation [62.275143240798236]
Video semantic segmentation dataset has limited categories per video.
Less than 10% of queries could be matched to receive meaningful gradient updates during VSS training.
Our method achieves state-of-the-art performance on the latest challenging VSS benchmark VSPW without bells and whistles.
arXiv Detail & Related papers (2023-09-14T20:31:06Z) - PSGformer: Enhancing 3D Point Cloud Instance Segmentation via Precise
Semantic Guidance [11.097083846498581]
PSGformer is a novel 3D instance segmentation network.
It incorporates two key advancements to enhance the performance of 3D instance segmentation.
It exceeds compared state-of-the-art methods by 2.2% on ScanNetv2 hidden test set in terms of mAP.
arXiv Detail & Related papers (2023-07-15T04:45:37Z) - Hierarchical Matching and Reasoning for Multi-Query Image Retrieval [113.44470784756308]
We propose a novel Hierarchical Matching and Reasoning Network (HMRN) for Multi-Query Image Retrieval (MQIR)
It disentangles MQIR into three hierarchical semantic representations, which is responsible to capture fine-grained local details, contextual global scopes, and high-level inherent correlations.
Our HMRN substantially surpasses the current state-of-the-art methods.
arXiv Detail & Related papers (2023-06-26T07:03:56Z) - Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation [103.90033029330527]
Few-Shot Instance (FSIS) requires detecting and segmenting novel classes with limited support examples.
We introduce a unified framework, Reference Twice (RefT), to exploit the relationship between support and query features for FSIS.
arXiv Detail & Related papers (2023-01-03T15:33:48Z) - Dynamic Focus-aware Positional Queries for Semantic Segmentation [94.6834904076914]
We propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries.
Our framework achieves SOTA performance and outperforms Mask2former by clear margins of 1.1%, 1.9%, and 1.1% single-scale mIoU with ResNet-50, Swin-T, and Swin-B backbones.
arXiv Detail & Related papers (2022-04-04T05:16:41Z) - Improving Video Instance Segmentation via Temporal Pyramid Routing [61.10753640148878]
Video Instance (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence.
We propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
Our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods.
arXiv Detail & Related papers (2021-07-28T03:57:12Z) - Spatial Object Recommendation with Hints: When Spatial Granularity
Matters [42.51352610054967]
We study how to support top-k spatial object recommendations at varying levels of spatial granularity.
We propose the use of a POI tree, which captures spatial containment relationships between Point of Interest (POI)
We design a novel multi-task learning model called MPR (short for Multi-level POI Recommendation), where each task aims to return the top-k POIs at a certain spatial granularity level.
arXiv Detail & Related papers (2021-01-08T11:39:51Z) - AutoPose: Searching Multi-Scale Branch Aggregation for Pose Estimation [96.29533512606078]
We present AutoPose, a novel neural architecture search(NAS) framework.
It is capable of automatically discovering multiple parallel branches of cross-scale connections towards accurate and high-resolution 2D human pose estimation.
arXiv Detail & Related papers (2020-08-16T22:27:43Z) - Multi-Person Pose Estimation with Enhanced Feature Aggregation and
Selection [33.15192824888279]
We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation.
Our method can well handle crowded, cluttered and occluded scenes.
Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-03-20T08:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.