The One Where They Reconstructed 3D Humans and Environments in TV Shows
- URL: http://arxiv.org/abs/2207.14279v1
- Date: Thu, 28 Jul 2022 17:57:30 GMT
- Title: The One Where They Reconstructed 3D Humans and Environments in TV Shows
- Authors: Georgios Pavlakos, Ethan Weber, Matthew Tancik, Angjoo Kanazawa
- Abstract summary: TV shows depict a wide variety of human behaviors and have been studied extensively for their potential to be a rich source of data.
We propose an automatic approach that operates on an entire season of a TV show and aggregates information in 3D.
We show that reasoning about humans and their environment in 3D enables a broad range of downstream applications.
- Score: 33.533207518342465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: TV shows depict a wide variety of human behaviors and have been studied
extensively for their potential to be a rich source of data for many
applications. However, the majority of the existing work focuses on 2D
recognition tasks. In this paper, we make the observation that there is a
certain persistence in TV shows, i.e., repetition of the environments and the
humans, which makes possible the 3D reconstruction of this content. Building on
this insight, we propose an automatic approach that operates on an entire
season of a TV show and aggregates information in 3D; we build a 3D model of
the environment, compute camera information, static 3D scene structure and body
scale information. Then, we demonstrate how this information acts as rich 3D
context that can guide and improve the recovery of 3D human pose and position
in these environments. Moreover, we show that reasoning about humans and their
environment in 3D enables a broad range of downstream applications:
re-identification, gaze estimation, cinematography and image editing. We apply
our approach on environments from seven iconic TV shows and perform an
extensive evaluation of the proposed system.
Related papers
- Diffusion Models in 3D Vision: A Survey [11.116658321394755]
We review the state-of-the-art approaches that leverage diffusion models for 3D visual tasks.
These approaches include 3D object generation, shape completion, point cloud reconstruction, and scene understanding.
We discuss potential solutions, including improving computational efficiency, enhancing multimodal fusion, and exploring the use of large-scale pretraining.
arXiv Detail & Related papers (2024-10-07T04:12:23Z) - Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input.
As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language [31.691159120136064]
We introduce the task of 3D visual grounding in large-scale dynamic scenes based on natural linguistic descriptions and online captured multi-modal visual data.
We present a novel method, dubbed WildRefer, for this task by fully utilizing the rich appearance information in images, the position and geometric clues in point cloud.
Our datasets are significant for the research of 3D visual grounding in the wild and has huge potential to boost the development of autonomous driving and service robots.
arXiv Detail & Related papers (2023-04-12T06:48:26Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z) - Human-Aware Object Placement for Visual Environment Reconstruction [63.14733166375534]
We show that human-scene interactions can be leveraged to improve the 3D reconstruction of a scene from a monocular RGB video.
Our key idea is that, as a person moves through a scene and interacts with it, we accumulate HSIs across multiple input images.
We show that our scene reconstruction can be used to refine the initial 3D human pose and shape estimation.
arXiv Detail & Related papers (2022-03-07T18:59:02Z) - Human Performance Capture from Monocular Video in the Wild [50.34917313325813]
We propose a method capable of capturing the dynamic 3D human shape from a monocular video featuring challenging body poses.
Our method outperforms state-of-the-art methods on an in-the-wild human video dataset 3DPW.
arXiv Detail & Related papers (2021-11-29T16:32:41Z) - Egocentric Activity Recognition and Localization on a 3D Map [94.30708825896727]
We address the problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos.
Our model takes the inputs of a Hierarchical Volumetric Representation (HVR) of the environment and an egocentric video, infers the 3D action location as a latent variable, and recognizes the action based on the video and contextual cues surrounding its potential locations.
arXiv Detail & Related papers (2021-05-20T06:58:15Z) - 3DCrowdNet: 2D Human Pose-Guided3D Crowd Human Pose and Shape Estimation
in the Wild [61.92656990496212]
3DCrowdNet is a 2D human pose-guided 3D crowd pose and shape estimation system for in-the-wild scenes.
We show that our 3DCrowdNet outperforms previous methods on in-the-wild crowd scenes.
arXiv Detail & Related papers (2021-04-15T08:21:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.