Learning About Objects by Learning to Interact with Them
- URL: http://arxiv.org/abs/2006.09306v2
- Date: Fri, 23 Oct 2020 23:46:17 GMT
- Title: Learning About Objects by Learning to Interact with Them
- Authors: Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi
- Abstract summary: Humans often learn about their world with little to no external supervision.
We present a computational framework to discover objects and learn their physical properties.
Our agent, when placed within the near photo-realistic and physics-enabled AI2-THOR environment, interacts with its world and learns about objects.
- Score: 29.51363040054068
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of the remarkable progress in computer vision has been focused around
fully supervised learning mechanisms relying on highly curated datasets for a
variety of tasks. In contrast, humans often learn about their world with little
to no external supervision. Taking inspiration from infants learning from their
environment through play and interaction, we present a computational framework
to discover objects and learn their physical properties along this paradigm of
Learning from Interaction. Our agent, when placed within the near
photo-realistic and physics-enabled AI2-THOR environment, interacts with its
world and learns about objects, their geometric extents and relative masses,
without any external guidance. Our experiments reveal that this agent learns
efficiently and effectively; not just for objects it has interacted with
before, but also for novel instances from seen categories as well as novel
object categories.
Related papers
- Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - The ObjectFolder Benchmark: Multisensory Learning with Neural and Real
Objects [51.22194706674366]
We introduce the Object Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning.
We also introduce the Object Real dataset, including the multisensory measurements for 100 real-world household objects.
arXiv Detail & Related papers (2023-06-01T17:51:22Z) - Object-agnostic Affordance Categorization via Unsupervised Learning of
Graph Embeddings [6.371828910727037]
Acquiring knowledge about object interactions and affordances can facilitate scene understanding and human-robot collaboration tasks.
We address the problem of affordance categorization for class-agnostic objects with an open set of interactions.
A novel depth-informed qualitative spatial representation is proposed for the construction of Activity Graphs.
arXiv Detail & Related papers (2023-03-30T15:04:04Z) - Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings.
We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters.
Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z) - Discovering a Variety of Objects in Spatio-Temporal Human-Object
Interactions [45.92485321148352]
In daily HOIs, humans often interact with a variety of objects, e.g., holding and touching dozens of household items in cleaning.
Here, we introduce a new benchmark based on AVA: Discoveringed Objects (DIO) including 51 interactions and 1,000+ objects.
An ST-HOI learning task is proposed expecting vision systems to track human actors, detect interactions and simultaneously discover objects.
arXiv Detail & Related papers (2022-11-14T16:33:54Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Embodied vision for learning object representations [4.211128681972148]
We show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments.
We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions.
arXiv Detail & Related papers (2022-05-12T16:36:27Z) - Capturing the objects of vision with neural networks [0.0]
Human visual perception carves a scene at its physical joints, decomposing the world into objects.
Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input.
We review related work in both fields and examine how these fields can help each other.
arXiv Detail & Related papers (2021-09-07T21:49:53Z) - O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance
Learning [24.9242853417825]
We propose a unified affordance learning framework to learn object-object interaction for various tasks.
We are able to conduct large-scale object-object affordance learning without the need for human annotations or demonstrations.
Experiments on large-scale synthetic data and real-world data prove the effectiveness of the proposed approach.
arXiv Detail & Related papers (2021-06-29T04:38:12Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - A Review on Intelligent Object Perception Methods Combining
Knowledge-based Reasoning and Machine Learning [60.335974351919816]
Object perception is a fundamental sub-field of Computer Vision.
Recent works seek ways to integrate knowledge engineering in order to expand the level of intelligence of the visual interpretation of objects.
arXiv Detail & Related papers (2019-12-26T13:26:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.