Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication
- URL: http://arxiv.org/abs/2410.07119v1
- Date: Wed, 9 Oct 2024 17:49:06 GMT
- Title: Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication
- Authors: Erzhen Hu, Mingyi Li, Jungtaek Hong, Xun Qian, Alex Olwal, David Kim, Seongkook Heo, Ruofei Du,
- Abstract summary: During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments.
We propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items.
Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions.
- Score: 29.051341502575198
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects restricts users' ability to spatially reference items in a shared immersive environment. To address this, we propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.
Related papers
- HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.
We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z) - InteractVLM: 3D Interaction Reasoning from 2D Foundational Models [85.76211596755151]
We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images.
Existing methods rely on 3D contact annotations collected via expensive motion-capture systems or tedious manual labeling.
We propose a new task called Semantic Human Contact estimation, where human contact predictions are conditioned explicitly on object semantics.
arXiv Detail & Related papers (2025-04-07T17:59:33Z) - Generative AI Framework for 3D Object Generation in Augmented Reality [0.0]
This thesis integrates state-of-the-art generative AI models for real-time creation of 3D objects in augmented reality (AR) environments.
The framework demonstrates applications across industries such as gaming, education, retail, and interior design.
A significant contribution is democratizing 3D model creation, making advanced AI tools accessible to a broader audience.
arXiv Detail & Related papers (2025-02-21T17:01:48Z) - OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains [66.62502882481373]
Current methods tend to focus either on the body or the hands, which limits their ability to produce cohesive and realistic interactions.
We propose OOD-HOI, a text-driven framework for generating whole-body human-object interactions that generalize well to new objects and actions.
Our approach integrates a dual-branch reciprocal diffusion model to synthesize initial interaction poses, a contact-guided interaction refiner to improve physical accuracy based on predicted contact areas, and a dynamic adaptation mechanism which includes semantic adjustment and geometry deformation to improve robustness.
arXiv Detail & Related papers (2024-11-27T10:13:35Z) - Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI [10.335943413484815]
seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment.
We introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation.
We demonstrate the usefulness of the proposed system through two real-world AR applications on Magic Leap 2: a) spatial search in physical environments with natural language and b) an intelligent inventory system that tracks object changes over time.
arXiv Detail & Related papers (2024-10-06T23:25:21Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Augmented Object Intelligence with XR-Objects [18.574032913387573]
This paper explores Artificial Object Intelligence in the context of XR, an interaction paradigm that aims to blur the lines between digital and physical.
We implement the AOI concept in the form of XR-Objects, an open-source prototype system.
This system enables analog objects to not only convey information but also to initiate digital actions, such as querying for details or executing tasks.
arXiv Detail & Related papers (2024-04-20T05:14:52Z) - ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative
Modeling of Human-Object Interactions [11.32229757116179]
We introduce the ParaHome system, designed to capture dynamic 3D movements of humans and objects within a common home environment.
By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction.
arXiv Detail & Related papers (2024-01-18T18:59:58Z) - MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in
3D World [55.878173953175356]
We propose MultiPLY, a multisensory embodied large language model.
We first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data.
We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks.
arXiv Detail & Related papers (2024-01-16T18:59:45Z) - Synthesizing Physically Plausible Human Motions in 3D Scenes [41.1310197485928]
We present a framework that enables physically simulated characters to perform long-term interaction tasks in diverse, cluttered, and unseen scenes.
Specifically, InterCon contains two complementary policies that enable characters to enter and leave the interacting state.
To generate interaction with objects at different places, we further design NavCon, a trajectory following policy, to keep characters' motions in the free space of 3D scenes.
arXiv Detail & Related papers (2023-08-17T15:17:49Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Szloca: towards a framework for full 3D tracking through a single camera
in context of interactive arts [1.0878040851638]
This research presents a novel way and a framework towards obtaining data and virtual representation of objects/people.
The model does not rely on complex training of computer vision systems but combines prior computer vision research and adds a capacity to represent z depth.
arXiv Detail & Related papers (2022-06-26T20:09:47Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Pixel Codec Avatars [99.36561532588831]
Pixel Codec Avatars (PiCA) is a deep generative model of 3D human faces.
On a single Oculus Quest 2 mobile VR headset, 5 avatars are rendered in realtime in the same scene.
arXiv Detail & Related papers (2021-04-09T23:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.