Related papers: MOGRAS: Human Motion with Grasping in 3D Scenes

MOGRAS: Human Motion with Grasping in 3D Scenes

URL: http://arxiv.org/abs/2510.22199v1
Date: Sat, 25 Oct 2025 07:39:02 GMT
Title: MOGRAS: Human Motion with Grasping in 3D Scenes
Authors: Kunal Bhosikar, Siddharth Katageri, Vivek Madhavaram, Kai Han, Charu Sharma,
Abstract summary: We introduce MOGRAS (Human MOtion with GRAsping in 3D Scenes), a large-scale dataset that bridges this gap.<n> MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes.<n>We propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes.
Score: 11.854589891738584
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generating realistic full-body motion interacting with objects is critical for applications in robotics, virtual reality, and human-computer interaction. While existing methods can generate full-body motion within 3D scenes, they often lack the fidelity for fine-grained tasks like object grasping. Conversely, methods that generate precise grasping motions typically ignore the surrounding 3D scene. This gap, generating full-body grasping motions that are physically plausible within a 3D scene, remains a significant challenge. To address this, we introduce MOGRAS (Human MOtion with GRAsping in 3D Scenes), a large-scale dataset that bridges this gap. MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes. We leverage MOGRAS to benchmark existing full-body grasping methods and demonstrate their limitations in scene-aware generation. Furthermore, we propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes. Through extensive quantitative and qualitative experiments, we validate the effectiveness of our dataset and highlight the significant improvements our proposed method achieves, paving the way for more realistic human-scene interactions.

Related papers

Object-centric 3D Motion Field for Robot Learning from Human Videos [56.9436352861611]
We propose to use object-centric 3D motion field to represent actions for robot learning from human videos.<n>We present a novel framework for extracting this representation from videos for zero-shot control.<n> Experiments show that our method reduces 3D motion estimation error by over 50% compared to the latest method.
arXiv Detail & Related papers (2025-06-04T17:59:06Z)
Motion-2-to-3: Leveraging 2D Motion Data to Boost 3D Motion Generation [43.915871360698546]
2D human videos offer a vast and accessible source of motion data, covering a wider range of styles and activities.<n>We introduce a novel framework that disentangles local joint motion from global movements, enabling efficient learning of local motion priors from 2D data.<n>Our method efficiently utilizes 2D data, supporting realistic 3D human motion generation and broadening the range of motion types it supports.
arXiv Detail & Related papers (2024-12-17T17:34:52Z)
Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models [16.259040755335885]
Previous auto-regression-based 3D scene generation methods have struggled to accurately capture the joint distribution of multiple objects and input humans. We introduce two spatial collision guidance mechanisms: human-object collision avoidance and object-room boundary constraints. Our framework can generate more natural and plausible 3D scenes with precise human-scene interactions.
arXiv Detail & Related papers (2024-06-26T08:18:39Z)
Synthesizing Diverse Human Motions in 3D Indoor Scenes [16.948649870341782]
We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. Existing approaches rely on training sequences that contain captured human motions and the 3D scenes they interact with. We propose a reinforcement learning-based approach that enables virtual humans to navigate in 3D scenes and interact with objects realistically and autonomously.
arXiv Detail & Related papers (2023-05-21T09:22:24Z)
Generating Continual Human Motion in Diverse 3D Scenes [51.90506920301473]
We introduce a method to synthesize animator guided human motion across 3D scenes.<n>We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints.<n>Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together.
arXiv Detail & Related papers (2023-04-04T18:24:22Z)
CIRCLE: Capture In Rich Contextual Environments [69.97976304918149]
We propose a novel motion acquisition system in which the actor perceives and operates in a highly contextual virtual world. We present CIRCLE, a dataset containing 10 hours of full-body reaching motion from 5 subjects across nine scenes. We use this dataset to train a model that generates human motion conditioned on scene information.
arXiv Detail & Related papers (2023-03-31T09:18:12Z)
MIME: Human-Aware 3D Scene Generation [55.30202416702207]
We generate 3D indoor scenes given 3D human motion. Human movement indicates the free-space in a room. Human contact indicates surfaces or objects that support activities such as sitting, lying or touching.
arXiv Detail & Related papers (2022-12-08T15:56:17Z)
HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes [54.61610144668777]
We present a novel scene-and-language conditioned generative model that can produce 3D human motions in 3D scenes. Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes.
arXiv Detail & Related papers (2022-10-18T10:14:11Z)
Learning Motion Priors for 4D Human Body Capture in 3D Scenes [81.54377747405812]
We propose LEMO: LEarning human MOtion priors for 4D human body capture. We introduce a novel motion prior, which reduces the jitters exhibited by poses recovered over a sequence. We also design a contact friction term and a contact-aware motion infiller obtained via per-instance self-supervised training. With our pipeline, we demonstrate high-quality 4D human body capture, reconstructing smooth motions and physically plausible body-scene interactions.
arXiv Detail & Related papers (2021-08-23T20:47:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.