Hand Image Understanding via Deep Multi-Task Learning
- URL: http://arxiv.org/abs/2107.11646v2
- Date: Wed, 28 Jul 2021 07:16:15 GMT
- Title: Hand Image Understanding via Deep Multi-Task Learning
- Authors: Xiong Zhang, Hongsheng Huang, Jianchao Tan, Hongmin Xu, Cheng Yang,
Guozhu Peng, Lei Wang, Ji Liu
- Abstract summary: We propose a novel Hand Image Understanding (HIU) framework to extract comprehensive information of the hand object from a single RGB image.
Our method significantly outperforms the state-of-the-art approaches on various widely-used datasets.
- Score: 34.515382305252814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing and understanding hand information from multimedia materials like
images or videos is important for many real world applications and remains
active in research community. There are various works focusing on recovering
hand information from single image, however, they usually solve a single task,
for example, hand mask segmentation, 2D/3D hand pose estimation, or hand mesh
reconstruction and perform not well in challenging scenarios. To further
improve the performance of these tasks, we propose a novel Hand Image
Understanding (HIU) framework to extract comprehensive information of the hand
object from a single RGB image, by jointly considering the relationships
between these tasks. To achieve this goal, a cascaded multi-task learning (MTL)
backbone is designed to estimate the 2D heat maps, to learn the segmentation
mask, and to generate the intermediate 3D information encoding, followed by a
coarse-to-fine learning paradigm and a self-supervised learning strategy.
Qualitative experiments demonstrate that our approach is capable of recovering
reasonable mesh representations even in challenging situations. Quantitatively,
our method significantly outperforms the state-of-the-art approaches on various
widely-used datasets, in terms of diverse evaluation metrics.
Related papers
- Learning-based Multi-View Stereo: A Survey [55.3096230732874]
Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments.
With the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods.
arXiv Detail & Related papers (2024-08-27T17:53:18Z) - CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding [38.53988682814626]
We propose a context-enhanced masked image modeling method (CtxMIM) for remote sensing image understanding.
CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches.
With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset.
arXiv Detail & Related papers (2023-09-28T18:04:43Z) - HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation [5.888156950854715]
We propose a novel self-supervised pre-training strategy for regressing 3D hand mesh parameters.
Our proposed approach, named HandMIM, achieves strong performance on various hand mesh estimation tasks.
arXiv Detail & Related papers (2023-07-29T19:46:06Z) - Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches.
The deep learning approach confirmed its potential for performing image segmentation.
The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z) - Multi-Task Self-Supervised Learning for Image Segmentation Task [0.0]
The paper presents 1. Self-supervised techniques to boost semantic segmentation performance using multi-task learning with Depth prediction and Surface Normalization.
2. Performance evaluation of the different types of weighing techniques (UW, Nash-MTL) used for Multi-task learning.
arXiv Detail & Related papers (2023-02-05T21:25:59Z) - Multi-task learning from fixed-wing UAV images for 2D/3D city modeling [0.0]
Multi-task learning is an approach to scene understanding which involves multiple related tasks each with potentially limited training data.
In urban management applications such as infrastructure development, traffic monitoring, smart 3D cities, and change detection, automated multi-task data analysis is required.
In this study, a common framework for the performance assessment of multi-task learning methods from fixed-wing UAV images for 2D/3D city modeling is presented.
arXiv Detail & Related papers (2021-08-25T14:45:42Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Joint Hand-object 3D Reconstruction from a Single Image with
Cross-branch Feature Fusion [78.98074380040838]
We propose to consider hand and object jointly in feature space and explore the reciprocity of the two branches.
We employ an auxiliary depth estimation module to augment the input RGB image with the estimated depth map.
Our approach significantly outperforms existing approaches in terms of the reconstruction accuracy of objects.
arXiv Detail & Related papers (2020-06-28T09:50:25Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.