PIP: Physical Interaction Prediction via Mental Imagery with Span
Selection
- URL: http://arxiv.org/abs/2109.04683v1
- Date: Fri, 10 Sep 2021 06:11:29 GMT
- Title: PIP: Physical Interaction Prediction via Mental Imagery with Span
Selection
- Authors: Jiafei Duan, Samson Yu, Soujanya Poria, Bihan Wen, Cheston Tan
- Abstract summary: We propose a novel PIP scheme: Physical Interaction Prediction via Mental Imagery with Span Selection.
PIP utilizes a deep generative model to output future frames of physical interactions among objects before extracting crucial information.
Our experiments show that PIP outperforms baselines and human performance in physical interaction prediction for both seen and unseen objects.
- Score: 24.22281131863951
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To align advanced artificial intelligence (AI) with human values and promote
safe AI, it is important for AI to predict the outcome of physical
interactions. Even with the ongoing debates on how humans predict the outcomes
of physical interactions among objects in the real world, there are works
attempting to tackle this task via cognitive-inspired AI approaches. However,
there is still a lack of AI approaches that mimic the mental imagery humans use
to predict physical interactions in the real world. In this work, we propose a
novel PIP scheme: Physical Interaction Prediction via Mental Imagery with Span
Selection. PIP utilizes a deep generative model to output future frames of
physical interactions among objects before extracting crucial information for
predicting physical interactions by focusing on salient frames using span
selection. To evaluate our model, we propose a large-scale SPACE+ dataset of
synthetic video frames, including three physical interaction events in a 3D
environment. Our experiments show that PIP outperforms baselines and human
performance in physical interaction prediction for both seen and unseen
objects. Furthermore, PIP's span selection scheme can effectively identify the
frames where physical interactions among objects occur within the generated
frames, allowing for added interpretability.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - InterDiff: Generating 3D Human-Object Interactions with Physics-Informed
Diffusion [29.25063155767897]
This paper addresses a novel task of anticipating 3D human-object interactions (HOIs)
Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions.
Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.
arXiv Detail & Related papers (2023-08-31T17:59:08Z) - Qualitative Prediction of Multi-Agent Spatial Interactions [5.742409080817885]
We present and benchmark three new approaches to model and predict multi-agent interactions in dense scenes.
The proposed solutions take into account static and dynamic context to predict individual interactions.
They exploit an input- and a temporal-attention mechanism, and are tested on medium and long-term time horizons.
arXiv Detail & Related papers (2023-06-30T18:08:25Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - INVIGORATE: Interactive Visual Grounding and Grasping in Clutter [56.00554240240515]
INVIGORATE is a robot system that interacts with human through natural language and grasps a specified object in clutter.
We train separate neural networks for object detection, for visual grounding, for question generation, and for OBR detection and grasping.
We build a partially observable Markov decision process (POMDP) that integrates the learned neural network modules.
arXiv Detail & Related papers (2021-08-25T07:35:21Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.