Simple Multi-Resolution Representation Learning for Human Pose
Estimation
- URL: http://arxiv.org/abs/2004.06366v2
- Date: Fri, 22 Jan 2021 06:01:11 GMT
- Title: Simple Multi-Resolution Representation Learning for Human Pose
Estimation
- Authors: Trung Q. Tran, Giang V. Nguyen, Daeyoung Kim
- Abstract summary: The accuracy of human keypoint prediction is increasingly improved thanks to the development of deep learning.
We introduce novel network structures referred to as multi-resolution representation learning for human keypoint prediction.
Our architectures are simple yet effective, achieving good performance.
- Score: 2.1904965822605433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human pose estimation - the process of recognizing human keypoints in a given
image - is one of the most important tasks in computer vision and has a wide
range of applications including movement diagnostics, surveillance, or
self-driving vehicle. The accuracy of human keypoint prediction is increasingly
improved thanks to the burgeoning development of deep learning. Most existing
methods solved human pose estimation by generating heatmaps in which the ith
heatmap indicates the location confidence of the ith keypoint. In this paper,
we introduce novel network structures referred to as multi-resolution
representation learning for human keypoint prediction. At different resolutions
in the learning process, our networks branch off and use extra layers to learn
heatmap generation. We firstly consider the architectures for generating the
multi-resolution heatmaps after obtaining the lowest-resolution feature maps.
Our second approach allows learning during the process of feature extraction in
which the heatmaps are generated at each resolution of the feature extractor.
The first and second approaches are referred to as multi-resolution heatmap
learning and multi-resolution feature map learning respectively. Our
architectures are simple yet effective, achieving good performance. We
conducted experiments on two common benchmarks for human pose estimation:
MSCOCO and MPII dataset. The code is made publicly available at
https://github.com/tqtrunghnvn/SimMRPose.
Related papers
- Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches.
The deep learning approach confirmed its potential for performing image segmentation.
The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z) - Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation [71.24808323646167]
We propose textbfDiffusionPose, a new scheme for learning keypoints heatmaps by a neural network.
During training, the keypoints are diffused to random distribution by adding noises and the diffusion model learns to recover ground-truth heatmaps from noised heatmaps.
Experiments show the prowess of our scheme with improvements of 1.6, 1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets.
arXiv Detail & Related papers (2023-06-29T16:24:32Z) - 2D Human Pose Estimation with Explicit Anatomical Keypoints Structure
Constraints [15.124606575017621]
We present a novel 2D human pose estimation method with explicit anatomical keypoints structure constraints.
Our proposed model can be plugged in the most existing bottom-up or top-down human pose estimation methods.
Our methods perform favorably against the most existing bottom-up and top-down human pose estimation methods.
arXiv Detail & Related papers (2022-12-05T11:01:43Z) - Virtual Multi-Modality Self-Supervised Foreground Matting for
Human-Object Interaction [18.14237514372724]
We propose a Virtual Multi-modality Foreground Matting (VMFM) method to learn human-object interactive foreground.
VMFM method requires no additional inputs, e.g. trimap or known background.
We reformulate foreground matting as a self-supervised multi-modality problem.
arXiv Detail & Related papers (2021-10-07T09:03:01Z) - Accurate Grid Keypoint Learning for Efficient Video Prediction [87.71109421608232]
Keypoint-based video prediction methods can consume substantial computing resources in training and deployment.
In this paper, we design a new grid keypoint learning framework, aiming at a robust and explainable intermediate keypoint representation for long-term efficient video prediction.
Our method outperforms the state-ofthe-art video prediction methods while saves 98% more than computing resources.
arXiv Detail & Related papers (2021-07-28T05:04:30Z) - DenserNet: Weakly Supervised Visual Localization Using Multi-scale
Feature Aggregation [7.2531609092488445]
We develop a convolutional neural network architecture which aggregates feature maps at different semantic levels for image representations.
Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs.
Third, our method is computationally efficient as our architecture has shared features and parameters during computation.
arXiv Detail & Related papers (2020-12-04T02:16:47Z) - Towards Keypoint Guided Self-Supervised Depth Estimation [0.0]
We use keypoints as a self-supervision clue for learning depth map estimation from a collection of input images.
By learning a deep model with and without the keypoint extraction technique, we show that using the keypoints improve the depth estimation learning.
arXiv Detail & Related papers (2020-11-05T20:45:03Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.