A Biologically-Inspired Dual Stream World Model
- URL: http://arxiv.org/abs/2209.08035v1
- Date: Fri, 16 Sep 2022 16:27:48 GMT
- Title: A Biologically-Inspired Dual Stream World Model
- Authors: Arthur Juliani, Margaret Sereno
- Abstract summary: The medial temporal lobe (MTL) is hypothesized to be an experience-construction system in mammals.
We propose a novel variant, the Dual Stream World Model (DSWM), which learns from high-dimensional observations and dissociates them into context and content streams.
We show that this representation is useful as a reinforcement learning basis function, and that the generative model can be used to aid the policy learning process using Dyna-like updates.
- Score: 0.456877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The medial temporal lobe (MTL), a brain region containing the hippocampus and
nearby areas, is hypothesized to be an experience-construction system in
mammals, supporting both recall and imagination of temporally-extended
sequences of events. Such capabilities are also core to many recently proposed
``world models" in the field of AI research. Taking inspiration from this
connection, we propose a novel variant, the Dual Stream World Model (DSWM),
which learns from high-dimensional observations and dissociates them into
context and content streams. DSWM can reliably generate imagined trajectories
in novel 2D environments after only a single exposure, outperforming a standard
world model. DSWM also learns latent representations which bear a strong
resemblance to place cells found in the hippocampus. We show that this
representation is useful as a reinforcement learning basis function, and that
the generative model can be used to aid the policy learning process using
Dyna-like updates.
Related papers
- A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model [7.286873011001679]
We propose a purely SSM-based approach with linear correlations for complexityD human pose estimation in monocular video video.
Specifically, we propose a bidirectional global temporal-local-temporal block that comprehensively models human joint relations within individual frames as well as across frames.
This strategy provides a more logical geometric ordering strategy, resulting in a combined-local spatial scan.
arXiv Detail & Related papers (2024-08-07T04:38:03Z) - LidarDM: Generative LiDAR Simulation in a Generated World [21.343346521878864]
LidarDM is a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos.
We employ latent diffusion models to generate the 3D scene, combine it with dynamic actors to form the underlying 4D world, and subsequently produce realistic sensory observations within this virtual environment.
Our experiments indicate that our approach outperforms competing algorithms in realism, temporal coherency, and layout consistency.
arXiv Detail & Related papers (2024-04-03T17:59:28Z) - Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models [65.08133391009838]
generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks.
We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs)
We present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
arXiv Detail & Related papers (2023-09-28T17:57:09Z) - Biologically-Motivated Learning Model for Instructed Visual Processing [3.105144691395886]
Current models of biologically plausible learning often use a cortical-like combination of bottom-up (BU) and top-down (TD) processing.
In the visual cortex, the TD pathway plays a second major role of visual attention, by guiding the visual process to locations and tasks of interest.
We introduce a model that uses a cortical-like combination of BU and TD processing that naturally integrates the two major functions of the TD stream.
arXiv Detail & Related papers (2023-06-04T17:38:06Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Conditional Image-to-Video Generation with Latent Flow Diffusion Models [18.13991670747915]
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image and a condition.
We propose an approach for cI2V using novel latent flow diffusion models (LFDM)
LFDM synthesizes an optical flow sequence in the latent space based on the given condition to warp the given image.
arXiv Detail & Related papers (2023-03-24T01:54:26Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Cycle-Consistent World Models for Domain Independent Latent Imagination [0.0]
High costs and risks make it hard to train autonomous cars in the real world.
We propose a novel model-based reinforcement learning approach called Cycleconsistent World Models.
arXiv Detail & Related papers (2021-10-02T13:55:50Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z) - A Comprehensive Study on Temporal Modeling for Online Action Detection [50.558313106389335]
Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years.
This paper aims to provide a comprehensive study on temporal modeling for OAD including four meta types of temporal modeling methods.
We present several hybrid temporal modeling methods, which outperform the recent state-of-the-art methods with sizable margins on THUMOS-14 and TVSeries.
arXiv Detail & Related papers (2020-01-21T13:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.