Visual Attention in Imaginative Agents
- URL: http://arxiv.org/abs/2104.00177v1
- Date: Thu, 1 Apr 2021 00:44:23 GMT
- Title: Visual Attention in Imaginative Agents
- Authors: Samrudhdhi B. Rangrej, James J. Clark
- Abstract summary: We present a recurrent agent who perceives surroundings through a series of discrete fixations.
At each timestep, the agent imagines a variety of plausible scenes consistent with the fixation history.
The agent is tested on various 2D and 3D datasets.
- Score: 5.203329540700176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a recurrent agent who perceives surroundings through a series of
discrete fixations. At each timestep, the agent imagines a variety of plausible
scenes consistent with the fixation history. The next fixation is planned using
uncertainty in the content of the imagined scenes. As time progresses, the
agent becomes more certain about the content of the surrounding, and the
variety in the imagined scenes reduces. The agent is built using a variational
autoencoder and normalizing flows, and trained in an unsupervised manner on a
proxy task of scene-reconstruction. The latent representations of the imagined
scenes are found to be useful for performing pixel-level and scene-level tasks
by higher-order modules. The agent is tested on various 2D and 3D datasets.
Related papers
- 3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans.
We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z) - Neural Scene Chronology [79.51094408119148]
We aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time.
In this work, we represent the scene as a space-time radiance field with a per-image illumination embedding, where temporally-varying scene changes are encoded using a set of learned step functions.
arXiv Detail & Related papers (2023-06-13T17:59:58Z) - CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination [87.4797527628459]
We introduce a new task/dataset called Commonsense Reasoning for Counterfactual Scene Imagination (CoSIm)
CoSIm is designed to evaluate the ability of AI systems to reason about scene change imagination.
arXiv Detail & Related papers (2022-07-08T15:28:23Z) - A Dynamic Data Driven Approach for Explainable Scene Understanding [0.0]
Scene-understanding is an important topic in the area of Computer Vision.
We consider the active explanation-driven understanding and classification of scenes.
Our framework is entitled ACUMEN: Active Classification and Understanding Method by Explanation-driven Networks.
arXiv Detail & Related papers (2022-06-18T02:41:51Z) - BlobGAN: Spatially Disentangled Scene Representations [67.60387150586375]
We propose an unsupervised, mid-level representation for a generative model of scenes.
The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - Continuous Scene Representations for Embodied AI [33.00565252990522]
Continuous Scene Representations (CSR) is a scene representation constructed by an embodied agent navigating within a space.
Our key insight is to embed pair-wise relationships between objects in a latent space.
CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations.
arXiv Detail & Related papers (2022-03-31T17:55:33Z) - Scene-Intuitive Agent for Remote Embodied Visual Grounding [89.73786309180139]
Humans learn from life events to form intuitions towards the understanding of visual environments and languages.
We present an agent that mimics such human behaviors.
arXiv Detail & Related papers (2021-03-24T02:37:48Z) - Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents.
Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z) - Trajectory Prediction for Autonomous Driving based on Multi-Head
Attention with Joint Agent-Map Representation [8.203012391711932]
Future trajectories of agents can be inferred using two important cues: the locations and past motion of agents, and the static scene structure.
We propose a novel approach applying multi-head attention by considering a joint representation of the static scene and surrounding agents.
Our model achieves results on the nuScenes prediction benchmark and generates diverse future trajectories compliant with scene structure and agent configuration.
arXiv Detail & Related papers (2020-05-06T00:39:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.