Learning Geometric Representations of Objects via Interaction
- URL: http://arxiv.org/abs/2309.05346v1
- Date: Mon, 11 Sep 2023 09:45:22 GMT
- Title: Learning Geometric Representations of Objects via Interaction
- Authors: Alfredo Reichlin, Giovanni Luca Marchetti, Hang Yin, Anastasiia
Varava, Danica Kragic
- Abstract summary: We address the problem of learning representations from observations of a scene involving an agent and an external object the agent interacts with.
We propose a representation learning framework extracting the location in physical space of both the agent and the object from unstructured observations of arbitrary nature.
- Score: 25.383613570119266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of learning representations from observations of a
scene involving an agent and an external object the agent interacts with. To
this end, we propose a representation learning framework extracting the
location in physical space of both the agent and the object from unstructured
observations of arbitrary nature. Our framework relies on the actions performed
by the agent as the only source of supervision, while assuming that the object
is displaced by the agent via unknown dynamics. We provide a theoretical
foundation and formally prove that an ideal learner is guaranteed to infer an
isometric representation, disentangling the agent from the object and correctly
extracting their locations. We evaluate empirically our framework on a variety
of scenarios, showing that it outperforms vision-based approaches such as a
state-of-the-art keypoint extractor. We moreover demonstrate how the extracted
representations enable the agent to solve downstream tasks via reinforcement
learning in an efficient manner.
Related papers
- Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Object-Centric Scene Representations using Active Inference [4.298360054690217]
Representing a scene and its constituent objects from raw sensory data is a core ability for enabling robots to interact with their environment.
We propose a novel approach for scene understanding, leveraging a hierarchical object-centric generative model that enables an agent to infer object category.
For evaluating the behavior of an active vision agent, we also propose a new benchmark where, given a target viewpoint of a particular object, the agent needs to find the best matching viewpoint.
arXiv Detail & Related papers (2023-02-07T06:45:19Z) - Homomorphism Autoencoder -- Learning Group Structured Representations from Observed Transitions [51.71245032890532]
We propose methods enabling an agent acting upon the world to learn internal representations of sensory information consistent with actions that modify it.
In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform.
arXiv Detail & Related papers (2022-07-25T11:22:48Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - Learning to Improve Representations by Communicating About Perspectives [0.0]
We present aminimal architecture comprised of a population of autoencoders.
We show that our proposed architectureallows the emergence of aligned representations.
Results demonstrate how communication from subjective perspec-tives can lead to the acquisition of more abstract representations in multi-agent systems.
arXiv Detail & Related papers (2021-09-20T09:30:13Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z) - Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a
First-person Simulated 3D Environment [73.9469267445146]
First-person object-interaction tasks in high-fidelity, 3D, simulated environments such as the AI2Thor pose significant sample-efficiency challenges for reinforcement learning agents.
We show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task.
arXiv Detail & Related papers (2020-10-28T19:27:26Z) - Agent Modelling under Partial Observability for Deep Reinforcement
Learning [12.903487594031276]
Existing methods for agent modelling assume knowledge of the local observations and chosen actions of the modelled agents during execution.
We learn to extract representations about the modelled agents conditioned only on the local observations of the controlled agent.
The representations are used to augment the controlled agent's decision policy which is trained via deep reinforcement learning.
arXiv Detail & Related papers (2020-06-16T18:43:42Z) - Self-supervised Learning from a Multi-view Perspective [121.63655399591681]
We show that self-supervised representations can extract task-relevant information and discard task-irrelevant information.
Our theoretical framework paves the way to a larger space of self-supervised learning objective design.
arXiv Detail & Related papers (2020-06-10T00:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.