Related papers: Visual Attention in Imaginative Agents

Visual Attention in Imaginative Agents

URL: http://arxiv.org/abs/2104.00177v1
Date: Thu, 1 Apr 2021 00:44:23 GMT
Title: Visual Attention in Imaginative Agents
Authors: Samrudhdhi B. Rangrej, James J. Clark
Abstract summary: We present a recurrent agent who perceives surroundings through a series of discrete fixations. At each timestep, the agent imagines a variety of plausible scenes consistent with the fixation history. The agent is tested on various 2D and 3D datasets.
Score: 5.203329540700176
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a recurrent agent who perceives surroundings through a series of discrete fixations. At each timestep, the agent imagines a variety of plausible scenes consistent with the fixation history. The next fixation is planned using uncertainty in the content of the imagined scenes. As time progresses, the agent becomes more certain about the content of the surrounding, and the variety in the imagined scenes reduces. The agent is built using a variational autoencoder and normalizing flows, and trained in an unsupervised manner on a proxy task of scene-reconstruction. The latent representations of the imagined scenes are found to be useful for performing pixel-level and scene-level tasks by higher-order modules. The agent is tested on various 2D and 3D datasets.

Related papers

VisAgent: Narrative-Preserving Story Visualization Framework [5.86192577938549]
VisAgent is a training-free framework designed to comprehend and visualize pivotal scenes within a given story. By considering story distillation, semantic consistency, and contextual coherence, VisAgent employs an agentic workflow. The empirically validated effectiveness confirms the framework's suitability for practical story visualization applications.
arXiv Detail & Related papers (2025-03-04T08:41:45Z)
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration [88.94832383850533]
We propose a multi-agent framework designed for Customized Storytelling Video Generation (CSVG) StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.
arXiv Detail & Related papers (2024-11-07T18:00:33Z)
3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z)
Neural Scene Chronology [79.51094408119148]
We aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time. In this work, we represent the scene as a space-time radiance field with a per-image illumination embedding, where temporally-varying scene changes are encoded using a set of learned step functions.
arXiv Detail & Related papers (2023-06-13T17:59:58Z)
CoSIm: Commonsense Reasoning for Counterfactual Scene Imagination [87.4797527628459]
We introduce a new task/dataset called Commonsense Reasoning for Counterfactual Scene Imagination (CoSIm) CoSIm is designed to evaluate the ability of AI systems to reason about scene change imagination.
arXiv Detail & Related papers (2022-07-08T15:28:23Z)
A Dynamic Data Driven Approach for Explainable Scene Understanding [0.0]
Scene-understanding is an important topic in the area of Computer Vision. We consider the active explanation-driven understanding and classification of scenes. Our framework is entitled ACUMEN: Active Classification and Understanding Method by Explanation-driven Networks.
arXiv Detail & Related papers (2022-06-18T02:41:51Z)
BlobGAN: Spatially Disentangled Scene Representations [67.60387150586375]
We propose an unsupervised, mid-level representation for a generative model of scenes. The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)
Continuous Scene Representations for Embodied AI [33.00565252990522]
Continuous Scene Representations (CSR) is a scene representation constructed by an embodied agent navigating within a space. Our key insight is to embed pair-wise relationships between objects in a latent space. CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations.
arXiv Detail & Related papers (2022-03-31T17:55:33Z)
Environment Predictive Coding for Embodied Agents [92.31905063609082]
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
arXiv Detail & Related papers (2021-02-03T23:43:16Z)
Trajectory Prediction for Autonomous Driving based on Multi-Head Attention with Joint Agent-Map Representation [8.203012391711932]
Future trajectories of agents can be inferred using two important cues: the locations and past motion of agents, and the static scene structure. We propose a novel approach applying multi-head attention by considering a joint representation of the static scene and surrounding agents. Our model achieves results on the nuScenes prediction benchmark and generates diverse future trajectories compliant with scene structure and agent configuration.
arXiv Detail & Related papers (2020-05-06T00:39:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.