Related papers: Moving Out: Physically-grounded Human-AI Collaboration

Moving Out: Physically-grounded Human-AI Collaboration

URL: http://arxiv.org/abs/2507.18623v2
Date: Sat, 26 Jul 2025 03:07:12 GMT
Title: Moving Out: Physically-grounded Human-AI Collaboration
Authors: Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo,
Abstract summary: We introduce Moving Out, a new human-AI collaboration benchmark.<n>We evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes.<n>Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration.
Score: 10.515976351631666
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce Moving Out, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.

Related papers

Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis [51.95817740348585]
Human-X is a novel framework designed to enable immersive and physically plausible human interactions across diverse entities.<n>Our method jointly predicts actions and reactions in real-time using an auto-regressive reaction diffusion planner.<n>Our framework is validated in real-world applications, including virtual reality interface for human-robot interaction.
arXiv Detail & Related papers (2025-08-04T06:35:48Z)
3HANDS Dataset: Learning from Humans for Generating Naturalistic Handovers with Supernumerary Robotic Limbs [64.99122701615151]
Supernumerary robotic limbs (SRLs) are robotic structures integrated closely with the user's body.<n>We present 3HANDS, a novel dataset of object handover interactions between a participant performing a daily activity and another participant enacting a hip-mounted SRL in a naturalistic manner.<n>We present three models that generate naturalistic handover trajectories, one that determines the appropriate handover endpoints, and a third that predicts the moment to initiate a handover.
arXiv Detail & Related papers (2025-03-06T17:23:55Z)
ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills [46.16771391136412]
ASAP is a two-stage framework designed to tackle the dynamics mismatch and enable agile humanoid whole-body skills.<n>In the first stage, we pre-train motion tracking policies in simulation using retargeted human motion data.<n>In the second stage, we deploy the policies in the real world and collect real-world data to train a delta (residual) action model.
arXiv Detail & Related papers (2025-02-03T08:22:46Z)
CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics [44.30880626337739]
CooHOI is a framework designed to tackle the challenge of multi-humanoid object transportation problem. A single humanoid character learns to interact with objects through imitation learning from human motion priors. Then, the humanoid learns to collaborate with others by considering the shared dynamics of the manipulated object.
arXiv Detail & Related papers (2024-06-20T17:59:22Z)
in2IN: Leveraging individual Information to Generate Human INteractions [29.495166514135295]
We introduce in2IN, a novel diffusion model for human-human motion generation conditioned on individual descriptions. We also propose DualMDM, a model composition technique that combines the motions generated with in2IN and the motions generated by a single-person motion prior pre-trained on HumanML3D.
arXiv Detail & Related papers (2024-04-15T17:59:04Z)
Online Learning of Human Constraints from Feedback in Shared Autonomy [25.173950581816086]
Real-time collaboration with humans poses challenges due to the different behavior patterns of humans resulting from diverse physical constraints. We learn a human constraints model that considers the diverse behaviors of different human operators. We propose an augmentative assistant agent capable of learning and adapting to human physical constraints.
arXiv Detail & Related papers (2024-03-05T13:53:48Z)
Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space. ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios. We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z)
On the Emergence of Symmetrical Reality [51.21203247240322]
We introduce the symmetrical reality framework, which offers a unified representation encompassing various forms of physical-virtual amalgamations. We propose an instance of an AI-driven active assistance service that illustrates the potential applications of symmetrical reality.
arXiv Detail & Related papers (2024-01-26T16:09:39Z)
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint [67.6297384588837]
We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. We demonstrate that the distance between joint pairs for human-wise interactions can be generated using an off-the-shelf Large Language Model.
arXiv Detail & Related papers (2023-11-27T14:32:33Z)
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots [119.55240471433302]
Habitat 3.0 is a simulation platform for studying collaborative human-robot tasks in home environments. It addresses challenges in modeling complex deformable bodies and diversity in appearance and motion. Human-in-the-loop infrastructure enables real human interaction with simulated robots via mouse/keyboard or a VR interface.
arXiv Detail & Related papers (2023-10-19T17:29:17Z)
Learning Human-to-Robot Handovers from Point Clouds [63.18127198174958]
We propose the first framework to learn control policies for vision-based human-to-robot handovers. We show significant performance gains over baselines on a simulation benchmark, sim-to-sim transfer and sim-to-real transfer.
arXiv Detail & Related papers (2023-03-30T17:58:36Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
Model Predictive Control for Fluid Human-to-Robot Handovers [50.72520769938633]
Planning motions that take human comfort into account is not a part of the human-robot handover process. We propose to generate smooth motions via an efficient model-predictive control framework. We conduct human-to-robot handover experiments on a diverse set of objects with several users.
arXiv Detail & Related papers (2022-03-31T23:08:20Z)
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions. Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.