Visual Perspective Taking for Opponent Behavior Modeling
- URL: http://arxiv.org/abs/2105.05145v1
- Date: Tue, 11 May 2021 16:02:32 GMT
- Title: Visual Perspective Taking for Opponent Behavior Modeling
- Authors: Boyuan Chen, Yuhang Hu, Robert Kwiatkowski, Shuran Song, Hod Lipson
- Abstract summary: We propose an end-to-end long-term visual prediction framework for robots.
We demonstrate our approach in the context of visual hide-and-seek.
We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities.
- Score: 22.69165968663182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to engage in complex social interaction, humans learn at a young age
to infer what others see and cannot see from a different point-of-view, and
learn to predict others' plans and behaviors. These abilities have been mostly
lacking in robots, sometimes making them appear awkward and socially inept.
Here we propose an end-to-end long-term visual prediction framework for robots
to begin to acquire both these critical cognitive skills, known as Visual
Perspective Taking (VPT) and Theory of Behavior (TOB). We demonstrate our
approach in the context of visual hide-and-seek - a game that represents a
cognitive milestone in human development. Unlike traditional visual predictive
model that generates new frames from immediate past frames, our agent can
directly predict to multiple future timestamps (25s), extrapolating by 175%
beyond the training horizon. We suggest that visual behavior modeling and
perspective taking skills will play a critical role in the ability of physical
robots to fully integrate into real-world multi-agent activities. Our website
is at http://www.cs.columbia.edu/~bchen/vpttob/.
Related papers
- Improving Visual Perception of a Social Robot for Controlled and
In-the-wild Human-robot Interaction [10.260966795508569]
It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model.
We employ state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot.
arXiv Detail & Related papers (2024-03-04T06:47:06Z) - Social-Transmotion: Promptable Human Trajectory Prediction [65.80068316170613]
Social-Transmotion is a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior.
Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY.
arXiv Detail & Related papers (2023-12-26T18:56:49Z) - What Matters to You? Towards Visual Representation Alignment for Robot
Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z) - Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact.
The structure of these behavioral affordances directly enables the robot to perform many complex tasks.
We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z) - A-ACT: Action Anticipation through Cycle Transformations [89.83027919085289]
We take a step back to analyze how the human capability to anticipate the future can be transferred to machine learning algorithms.
A recent study on human psychology explains that, in anticipating an occurrence, the human brain counts on both systems.
In this work, we study the impact of each system for the task of action anticipation and introduce a paradigm to integrate them in a learning framework.
arXiv Detail & Related papers (2022-04-02T21:50:45Z) - Visual Intelligence through Human Interaction [43.82765410550207]
We demonstrate that novel interaction strategies can enable new forms of data collection and evaluation for Computer Vision.
We present a crowdsourcing interface for speeding up paid data collection by an order of magnitude, feeding the data-hungry nature of modern vision models.
Third, we develop a system to ensure human evaluation of generative vision models are reliable, affordable and grounded in psychophysics theory.
arXiv Detail & Related papers (2021-11-12T19:37:17Z) - What Can I Do Here? Learning New Skills by Imagining Visual Affordances [128.65223577406587]
We show how generative models of possible outcomes can allow a robot to learn visual representations of affordances.
In effect, prior data is used to learn what kinds of outcomes may be possible, such that when the robot encounters an unfamiliar setting, it can sample potential outcomes from its model.
We show that visuomotor affordance learning (VAL) can be used to train goal-conditioned policies that operate on raw image inputs.
arXiv Detail & Related papers (2021-06-01T17:58:02Z) - Smile Like You Mean It: Driving Animatronic Robotic Face with Learned
Models [11.925808365657936]
Ability to generate intelligent and generalizable facial expressions is essential for building human-like social robots.
We develop a vision-based self-supervised learning framework for facial mimicry.
Our method enables accurate and diverse face mimicry across diverse human subjects.
arXiv Detail & Related papers (2021-05-26T17:57:19Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.