Lighter Stacked Hourglass Human Pose Estimation
- URL: http://arxiv.org/abs/2107.13643v1
- Date: Wed, 28 Jul 2021 21:05:34 GMT
- Title: Lighter Stacked Hourglass Human Pose Estimation
- Authors: Ahmed Elhagry, Mohamed Saeed, Musie Araia
- Abstract summary: We focus on one of the deep learning-based approaches of human pose estimation proposed by Newell et al.
Their approach is widely used in many applications and is regarded as one of the best works in this area.
In this study, we study the effect of architectural modifications on the computational speed and accuracy of the network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Human pose estimation (HPE) is one of the most challenging tasks in computer
vision as humans are deformable by nature and thus their pose has so much
variance. HPE aims to correctly identify the main joint locations of a single
person or multiple people in a given image or video. Locating joints of a
person in images or videos is an important task that can be applied in action
recognition and object tracking. As have many computer vision tasks, HPE has
advanced massively with the introduction of deep learning to the field. In this
paper, we focus on one of the deep learning-based approaches of HPE proposed by
Newell et al., which they named the stacked hourglass network. Their approach
is widely used in many applications and is regarded as one of the best works in
this area. The main focus of their approach is to capture as much information
as it can at all possible scales so that a coherent understanding of the local
features and full-body location is achieved. Their findings demonstrate that
important cues such as orientation of a person, arrangement of limbs, and
adjacent joints' relative location can be identified from multiple scales at
different resolutions. To do so, they makes use of a single pipeline to process
images in multiple resolutions, which comprises a skip layer to not lose
spatial information at each resolution. The resolution of the images stretches
as lower as 4x4 to make sure that a smaller spatial feature is included. In
this study, we study the effect of architectural modifications on the
computational speed and accuracy of the network.
Related papers
- Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection [24.00828999360765]
This paper addresses the challenge of robotic grasping of general objects.
The proposed model first runs by proposing a number of most likely grasp points in the scene.
Around each grasp point, a module is designed to infer any voxel in its neighborhood to be either void or occupied by some object.
The model further estimates 6-DoF grasp poses utilizing the local occupancy-enhanced object shape information.
arXiv Detail & Related papers (2024-07-22T16:22:28Z) - Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - DECA: Deep viewpoint-Equivariant human pose estimation using Capsule
Autoencoders [3.2826250607043796]
We show that current 3D Human Pose Estimation methods tend to fail when dealing with viewpoints unseen at training time.
We propose a novel capsule autoencoder network with fast Variational Bayes capsule routing, named DECA.
In the experimental validation, we outperform other methods on depth images from both seen and unseen viewpoints, both top-view, and front-view.
arXiv Detail & Related papers (2021-08-19T08:46:15Z) - Scale Normalized Image Pyramids with AutoFocus for Object Detection [75.71320993452372]
A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales.
We propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects.
The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.
arXiv Detail & Related papers (2021-02-10T18:57:53Z) - Gravitational Models Explain Shifts on Human Visual Attention [80.76475913429357]
Visual attention refers to the human brain's ability to select relevant sensory information for preferential processing.
Various methods to estimate saliency have been proposed in the last three decades.
We propose a gravitational model (GRAV) to describe the attentional shifts.
arXiv Detail & Related papers (2020-09-15T10:12:41Z) - Rethinking of the Image Salient Object Detection: Object-level Semantic
Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter [62.26677215668959]
We propose a lightweight, weakly supervised deep network to coarsely locate semantically salient regions.
We then fuse multiple off-the-shelf deep models on these semantically salient regions as the pixel-wise saliency refinement.
Our method is simple yet effective, which is the first attempt to consider the salient object detection mainly as an object-level semantic re-ranking problem.
arXiv Detail & Related papers (2020-08-10T07:12:43Z) - Human Pose Estimation on Privacy-Preserving Low-Resolution Depth Images [2.8802646903517957]
Human pose estimation (HPE) is a key building block for developing AI-based context-aware systems inside the operating room (OR)
Being able to solely use low-resolution privacy-preserving images would address these concerns.
We propose an end-to-end solution that integrates a multi-scale super-resolution network with a 2D human pose estimation network.
arXiv Detail & Related papers (2020-07-16T14:03:52Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Simple Multi-Resolution Representation Learning for Human Pose
Estimation [2.1904965822605433]
The accuracy of human keypoint prediction is increasingly improved thanks to the development of deep learning.
We introduce novel network structures referred to as multi-resolution representation learning for human keypoint prediction.
Our architectures are simple yet effective, achieving good performance.
arXiv Detail & Related papers (2020-04-14T09:03:16Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.