Learning Robust Dynamics through Variational Sparse Gating
- URL: http://arxiv.org/abs/2210.11698v1
- Date: Fri, 21 Oct 2022 02:56:51 GMT
- Title: Learning Robust Dynamics through Variational Sparse Gating
- Authors: Arnav Kumar Jain, Shivakanth Sujit, Shruti Joshi, Vincent Michalski,
Danijar Hafner, Samira Ebrahimi-Kahou
- Abstract summary: In environments with many objects, often only a small number of them are moving or interacting at the same time.
In this paper, we investigate integrating this inductive bias of sparse interactions into the latent dynamics of world models trained from pixels.
- Score: 18.476155786474358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning world models from their sensory inputs enables agents to plan for
actions by imagining their future outcomes. World models have previously been
shown to improve sample-efficiency in simulated environments with few objects,
but have not yet been applied successfully to environments with many objects.
In environments with many objects, often only a small number of them are moving
or interacting at the same time. In this paper, we investigate integrating this
inductive bias of sparse interactions into the latent dynamics of world models
trained from pixels. First, we introduce Variational Sparse Gating (VSG), a
latent dynamics model that updates its feature dimensions sparsely through
stochastic binary gates. Moreover, we propose a simplified architecture Simple
Variational Sparse Gating (SVSG) that removes the deterministic pathway of
previous models, resulting in a fully stochastic transition function that
leverages the VSG mechanism. We evaluate the two model architectures in the
BringBackShapes (BBS) environment that features a large number of moving
objects and partial observability, demonstrating clear improvements over prior
models.
Related papers
- M$^3$-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation [51.82272563578793]
We introduce the concept of phase in segmentation, which categorizes real-world objects based on their visual characteristics and potential morphological and appearance changes.
We present a new benchmark, Multi-Phase, Multi-Transition and Multi-Scenery Video Object (M$3$-VOS), to verify the ability of models to understand object phases.
We propose ReVOS, a new plug-andplay model that improves its performance by reversal refinement.
arXiv Detail & Related papers (2024-12-18T12:50:11Z) - GMS-VINS:Multi-category Dynamic Objects Semantic Segmentation for Enhanced Visual-Inertial Odometry Using a Promptable Foundation Model [7.07379964916809]
We introduce GMS-VINS, which integrates an enhanced SORT algorithm along with a robust multi-category segmentation framework into visual-inertial odometry (VIO)
The enhanced SORT algorithm significantly improves the reliability of tracking multiple dynamic objects.
Our proposed method performs impressively in multiple scenarios, outperforming other state-of-the-art methods.
arXiv Detail & Related papers (2024-11-28T17:41:33Z) - SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels [16.020835290802548]
Slot-Attention for Object-centric Latent Dynamics is a novel model-based reinforcement learning algorithm.
It learns object-centric dynamics models in an unsupervised manner from pixel inputs.
We demonstrate that the structured latent space not only improves model interpretability but also provides a valuable input space for behavior models to reason over.
arXiv Detail & Related papers (2024-10-11T14:03:31Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Relax, it doesn't matter how you get there: A new self-supervised
approach for multi-timescale behavior analysis [8.543808476554695]
We develop a multi-task representation learning model for behavior that combines two novel components.
Our model ranks 1st overall and on all global tasks, and 1st or 2nd on 7 out of 9 frame-level tasks.
arXiv Detail & Related papers (2023-03-15T17:58:48Z) - Hidden Parameter Recurrent State Space Models For Changing Dynamics
Scenarios [18.08665164701404]
Recurrent State-space models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios.
We introduce the Hidden Recurrent State Space Models (HiP- RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors.
We show that HiP- RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.
arXiv Detail & Related papers (2022-06-29T14:54:49Z) - Learning Interacting Dynamical Systems with Latent Gaussian Process ODEs [13.436770170612295]
We study for the first time uncertainty-aware modeling of continuous-time dynamics of interacting objects.
Our model infers both independent dynamics and their interactions with reliable uncertainty estimates.
arXiv Detail & Related papers (2022-05-24T08:36:25Z) - Transformer Inertial Poser: Attention-based Real-time Human Motion
Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time.
Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z) - MoCo-Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary
Monocular Cameras [98.40768911788854]
We introduce MoCo-Flow, a representation that models the dynamic scene using a 4D continuous time-variant function.
At the heart of our work lies a novel optimization formulation, which is constrained by a motion consensus regularization on the motion flow.
We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity.
arXiv Detail & Related papers (2021-06-08T16:03:50Z) - GEM: Group Enhanced Model for Learning Dynamical Control Systems [78.56159072162103]
We build effective dynamical models that are amenable to sample-based learning.
We show that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model.
This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions.
arXiv Detail & Related papers (2021-04-07T01:08:18Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.