HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences
- URL: http://arxiv.org/abs/2103.15573v1
- Date: Mon, 29 Mar 2021 12:43:44 GMT
- Title: HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences
- Authors: Feitong Tan, Danhang Tang, Mingsong Dou, Kaiwen Guo, Rohit Pandey, Cem
Keskin, Ruofei Du, Deqing Sun, Sofien Bouaziz, Sean Fanello, Ping Tan, Yinda
Zhang
- Abstract summary: Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts.
We propose a deep learning framework that maps each pixel to a feature space, where the feature distances reflect the geodesic distances among pixels.
Without any semantic annotation, the proposed embeddings automatically learn to differentiate visually similar parts and align different subjects into an unified feature space.
- Score: 60.89437526374286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the problem of building dense correspondences
between human images under arbitrary camera viewpoints and body poses. Prior
art either assumes small motion between frames or relies on local descriptors,
which cannot handle large motion or visually ambiguous body parts, e.g., left
vs. right hand. In contrast, we propose a deep learning framework that maps
each pixel to a feature space, where the feature distances reflect the geodesic
distances among pixels as if they were projected onto the surface of a 3D human
scan. To this end, we introduce novel loss functions to push features apart
according to their geodesic distances on the surface. Without any semantic
annotation, the proposed embeddings automatically learn to differentiate
visually similar parts and align different subjects into an unified feature
space. Extensive experiments show that the learned embeddings can produce
accurate correspondences between images with remarkable generalization
capabilities on both intra and inter subjects.
Related papers
- 3D Reconstruction of Interacting Multi-Person in Clothing from a Single Image [8.900009931200955]
This paper introduces a novel pipeline to reconstruct the geometry of interacting multi-person in clothing on a globally coherent scene space from a single image.
We overcome this challenge by utilizing two human priors for complete 3D geometry and surface contacts.
The results demonstrate that our method is complete, globally coherent, and physically plausible compared to existing methods.
arXiv Detail & Related papers (2024-01-12T07:23:02Z) - CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from
Unbounded Synthesized Images [10.4286198282079]
We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D.
We show multiple 2D images captured from different viewpoints when humans interact with the same type of objects.
Despite its imperfection of the image quality over real images, we demonstrate that the synthesized images are sufficient to learn the 3D human-object spatial relations.
arXiv Detail & Related papers (2023-08-23T17:59:11Z) - Grounding 3D Object Affordance from 2D Interactions in Images [128.6316708679246]
Grounding 3D object affordance seeks to locate objects' ''action possibilities'' regions in the 3D space.
Humans possess the ability to perceive object affordances in the physical world through demonstration images or videos.
We devise an Interaction-driven 3D Affordance Grounding Network (IAG), which aligns the region feature of objects from different sources.
arXiv Detail & Related papers (2023-03-18T15:37:35Z) - BodyMap: Learning Full-Body Dense Correspondence Map [19.13654133912062]
BodyMap is a new framework for obtaining high-definition full-body and continuous dense correspondence between in-the-wild images of humans and the surface of a 3D template model.
Dense correspondence between humans carries powerful semantic information that can be utilized to solve fundamental problems for full-body understanding.
arXiv Detail & Related papers (2022-05-18T17:58:11Z) - Learning to Disambiguate Strongly Interacting Hands via Probabilistic
Per-pixel Part Segmentation [84.28064034301445]
Self-similarity, and the resulting ambiguities in assigning pixel observations to the respective hands, is a major cause of the final 3D pose error.
We propose DIGIT, a novel method for estimating the 3D poses of two interacting hands from a single monocular image.
We experimentally show that the proposed approach achieves new state-of-the-art performance on the InterHand2.6M dataset.
arXiv Detail & Related papers (2021-07-01T13:28:02Z) - Perceiving 3D Human-Object Spatial Arrangements from a Single Image in
the Wild [96.08358373137438]
We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene.
Our method runs on datasets without any scene- or object-level 3D supervision.
arXiv Detail & Related papers (2020-07-30T17:59:50Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - Learning Depth With Very Sparse Supervision [57.911425589947314]
This paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment.
We train a specialized global-local network architecture with what would be available to a robot interacting with the environment.
Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-02T10:44:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.