Learning Dynamic Attribute-factored World Models for Efficient
Multi-object Reinforcement Learning
- URL: http://arxiv.org/abs/2307.09205v1
- Date: Tue, 18 Jul 2023 12:41:28 GMT
- Title: Learning Dynamic Attribute-factored World Models for Efficient
Multi-object Reinforcement Learning
- Authors: Fan Feng and Sara Magliacane
- Abstract summary: In many reinforcement learning tasks, the agent has to learn to interact with many objects of different types and generalize to unseen combinations and numbers of objects.
Recent works have shown the benefits of object-factored representations and hierarchical abstractions for improving sample efficiency.
We introduce the Dynamic Attribute FacTored RL (DAFT-RL) framework to exploit the benefits of factorization in terms of object attributes.
- Score: 6.447052211404121
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In many reinforcement learning tasks, the agent has to learn to interact with
many objects of different types and generalize to unseen combinations and
numbers of objects. Often a task is a composition of previously learned tasks
(e.g. block stacking). These are examples of compositional generalization, in
which we compose object-centric representations to solve complex tasks. Recent
works have shown the benefits of object-factored representations and
hierarchical abstractions for improving sample efficiency in these settings. On
the other hand, these methods do not fully exploit the benefits of
factorization in terms of object attributes. In this paper, we address this
opportunity and introduce the Dynamic Attribute FacTored RL (DAFT-RL)
framework. In DAFT-RL, we leverage object-centric representation learning to
extract objects from visual inputs. We learn to classify them in classes and
infer their latent parameters. For each class of object, we learn a class
template graph that describes how the dynamics and reward of an object of this
class factorize according to its attributes. We also learn an interaction
pattern graph that describes how objects of different classes interact with
each other at the attribute level. Through these graphs and a dynamic
interaction graph that models the interactions between objects, we can learn a
policy that can then be directly applied in a new environment by just
estimating the interactions and latent parameters. We evaluate DAFT-RL in three
benchmark datasets and show our framework outperforms the state-of-the-art in
generalizing across unseen objects with varying attributes and latent
parameters, as well as in the composition of previously learned tasks.
Related papers
- ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding [42.10086029931937]
Visual grounding aims to localize the object referred to in an image based on a natural language query.
Existing methods demonstrate a significant performance drop when there are multiple distractions in an image.
We propose a novel approach, the Relation and Semantic-sensitive Visual Grounding (ResVG) model, to address this issue.
arXiv Detail & Related papers (2024-08-29T07:32:01Z) - Towards reporting bias in visual-language datasets: bimodal augmentation
by decoupling object-attribute association [23.06058982328083]
We focus on the wide existence of reporting bias in visual-language datasets.
We propose a bimodal augmentation (BiAug) approach to mitigate this bias.
BiAug synthesizes visual-language examples with a rich array of object-attribute pairing and construct cross-modal hard negatives.
arXiv Detail & Related papers (2023-10-02T16:48:50Z) - Universal Instance Perception as Object Discovery and Retrieval [90.96031157557806]
UNI reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm.
It can flexibly perceive different types of objects by simply changing the input prompts.
UNI shows superior performance on 20 challenging benchmarks from 10 instance-level tasks.
arXiv Detail & Related papers (2023-03-12T14:28:24Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Complex-Valued Autoencoders for Object Discovery [62.26260974933819]
We propose a distributed approach to object-centric representations: the Complex AutoEncoder.
We show that this simple and efficient approach achieves better reconstruction performance than an equivalent real-valued autoencoder on simple multi-object datasets.
We also show that it achieves competitive unsupervised object discovery performance to a SlotAttention model on two datasets, and manages to disentangle objects in a third dataset where SlotAttention fails - all while being 7-70 times faster to train.
arXiv Detail & Related papers (2022-04-05T09:25:28Z) - Can I see an Example? Active Learning the Long Tail of Attributes and
Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes.
While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories.
Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z) - Object Pursuit: Building a Space of Objects via Discriminative Weight
Generation [23.85039747700698]
We propose a framework to continuously learn object-centric representations for visual learning and understanding.
We leverage interactions to sample diverse variations of an object and the corresponding training signals while learning the object-centric representations.
We perform an extensive study of the key features of the proposed framework and analyze the characteristics of the learned representations.
arXiv Detail & Related papers (2021-12-15T08:25:30Z) - Plug and Play, Model-Based Reinforcement Learning [60.813074750879615]
We introduce an object-based representation that allows zero-shot integration of new objects from known object classes.
This is achieved by representing the global transition dynamics as a union of local transition functions.
Experiments show that our representation can achieve sample-efficiency in a variety of set-ups.
arXiv Detail & Related papers (2021-08-20T01:20:15Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z) - Relevance-Guided Modeling of Object Dynamics for Reinforcement Learning [0.0951828574518325]
Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment.
We propose a framework for reasoning about object dynamics and behavior to rapidly determine minimal and task-specific object representations.
We also highlight the potential of this framework on several Atari games, using our object representation and standard RL and planning algorithms to learn dramatically faster than existing deep RL algorithms.
arXiv Detail & Related papers (2020-03-03T08:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.