Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation
- URL: http://arxiv.org/abs/2003.07516v1
- Date: Tue, 17 Mar 2020 03:52:17 GMT
- Title: Augmented Parallel-Pyramid Net for Attention Guided Pose-Estimation
- Authors: Luanxuan Hou and Jie Cao and Yuan Zhao and Haifeng Shen and Yiping
Meng and Ran He and Jieping Ye
- Abstract summary: This paper proposes an augmented parallel-pyramid net with attention partial module and differentiable auto-data augmentation.
We define a new pose search space where the sequences of data augmentations are formulated as a trainable and operational CNN component.
Notably, our method achieves the top-1 accuracy on the challenging COCO keypoint benchmark and the state-of-the-art results on the MPII datasets.
- Score: 90.28365183660438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The target of human pose estimation is to determine body part or joint
locations of each person from an image. This is a challenging problems with
wide applications. To address this issue, this paper proposes an augmented
parallel-pyramid net with attention partial module and differentiable auto-data
augmentation. Technically, a parallel pyramid structure is proposed to
compensate the loss of information. We take the design of parallel structure
for reverse compensation. Meanwhile, the overall computational complexity does
not increase. We further define an Attention Partial Module (APM) operator to
extract weighted features from different scale feature maps generated by the
parallel pyramid structure. Compared with refining through upsampling operator,
APM can better capture the relationship between channels. At last, we proposed
a differentiable auto data augmentation method to further improve estimation
accuracy. We define a new pose search space where the sequences of data
augmentations are formulated as a trainable and operational CNN component.
Experiments corroborate the effectiveness of our proposed method. Notably, our
method achieves the top-1 accuracy on the challenging COCO keypoint benchmark
and the state-of-the-art results on the MPII datasets.
Related papers
- Multi-Level Aggregation and Recursive Alignment Architecture for Efficient Parallel Inference Segmentation Network [18.47001817385548]
We propose a parallel inference network customized for semantic segmentation tasks.
We employ a shallow backbone to ensure real-time speed, and propose three core components to compensate for the reduced model capacity to improve accuracy.
Our framework shows a better balance between speed and accuracy than state-of-the-art real-time methods on Cityscapes and CamVid datasets.
arXiv Detail & Related papers (2024-02-03T22:51:17Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - IMP: Iterative Matching and Pose Estimation with Adaptive Pooling [34.36397639248686]
We propose an textbfefficient IMP, called EIMP, to dynamically discard keypoints without potential matches.
Experiments on YFCC100m, Scannet, and Aachen Day-Night datasets demonstrate that the proposed method outperforms previous approaches in terms of accuracy and efficiency.
arXiv Detail & Related papers (2023-04-28T13:25:50Z) - Integrative Feature and Cost Aggregation with Transformers for Dense
Correspondence [63.868905184847954]
The current state-of-the-art are Transformer-based approaches that focus on either feature descriptors or cost volume aggregation.
We propose a novel Transformer-based network that interleaves both forms of aggregations in a way that exploits their complementary information.
We evaluate the effectiveness of the proposed method on dense matching tasks and achieve state-of-the-art performance on all the major benchmarks.
arXiv Detail & Related papers (2022-09-19T03:33:35Z) - DepthFormer: Exploiting Long-Range Correlation and Local Information for
Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation.
We propose to leverage the Transformer to model this global context with an effective attention mechanism.
Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z) - Dynamic Spatial Propagation Network for Depth Completion [6.3447233767041356]
This paper introduces an efficient model that learns the affinity among neighboring pixels with an attention-based approach.
In practice, our method requires less iteration to match the performance of other SPNs and yields better results overall.
arXiv Detail & Related papers (2022-02-20T09:43:17Z) - Cost Aggregation Is All You Need for Few-Shot Segmentation [28.23753949369226]
We introduce Volumetric Aggregation with Transformers (VAT) to tackle the few-shot segmentation task.
VAT uses both convolutions and transformers to efficiently handle high dimensional correlation maps between query and support.
We find that the proposed method attains state-of-the-art performance even for the standard benchmarks in semantic correspondence task.
arXiv Detail & Related papers (2021-12-22T06:18:51Z) - $P^2$ Net: Augmented Parallel-Pyramid Net for Attention Guided Pose
Estimation [69.25492391672064]
We propose an augmented Parallel-Pyramid Net with feature refinement by dilated bottleneck and attention module.
A parallel-pyramid structure is followed to compensate the information loss introduced by the network.
Our method achieves the best performance on the challenging MSCOCO and MPII datasets.
arXiv Detail & Related papers (2020-10-26T02:10:12Z) - FPCR-Net: Feature Pyramidal Correlation and Residual Reconstruction for
Optical Flow Estimation [72.41370576242116]
We propose a semi-supervised Feature Pyramidal Correlation and Residual Reconstruction Network (FPCR-Net) for optical flow estimation from frame pairs.
It consists of two main modules: pyramid correlation mapping and residual reconstruction.
Experiment results show that the proposed scheme achieves the state-of-the-art performance, with improvement by 0.80, 1.15 and 0.10 in terms of average end-point error (AEE) against competing baseline methods.
arXiv Detail & Related papers (2020-01-17T07:13:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.