Related papers: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

A Probabilistic Hard Attention Model For Sequentially Observed Scenes

URL: http://arxiv.org/abs/2111.07534v1
Date: Mon, 15 Nov 2021 04:47:47 GMT
Title: A Probabilistic Hard Attention Model For Sequentially Observed Scenes
Authors: Samrudhdhi B. Rangrej, James J. Clark
Abstract summary: A visual hard attention model actively selects and observes a sequence of subregions in an image to make a prediction. In this paper, we design an efficient hard attention model for classifying such sequentially observed scenes. Our model gains 2-10% higher accuracy than the baseline models when both have seen only a couple of glimpses.
Score: 5.203329540700176
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A visual hard attention model actively selects and observes a sequence of subregions in an image to make a prediction. The majority of hard attention models determine the attention-worthy regions by first analyzing a complete image. However, it may be the case that the entire image is not available initially but instead sensed gradually through a series of partial observations. In this paper, we design an efficient hard attention model for classifying such sequentially observed scenes. The presented model never observes an image completely. To select informative regions under partial observability, the model uses Bayesian Optimal Experiment Design. First, it synthesizes the features of the unobserved regions based on the already observed regions. Then, it uses the predicted features to estimate the expected information gain (EIG) attained, should various regions be attended. Finally, the model attends to the actual content on the location where the EIG mentioned above is maximum. The model uses a) a recurrent feature aggregator to maintain a recurrent state, b) a linear classifier to predict the class label, c) a Partial variational autoencoder to predict the features of unobserved regions. We use normalizing flows in Partial VAE to handle multi-modality in the feature-synthesis problem. We train our model using a differentiable objective and test it on five datasets. Our model gains 2-10% higher accuracy than the baseline models when both have seen only a couple of glimpses.

Related papers

ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction [15.624698974735654]
ASTRA (A Scene-aware TRAnsformer-based model for trajectory prediction) is a light-weight pedestrian trajectory forecasting model. We utilise a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a graph-aware transformer encoder for capturing social interactions.
arXiv Detail & Related papers (2025-01-16T23:28:30Z)
Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector [30.23453108681447]
Inherently explainable attribution method aims to enhance the understanding of model behavior. It is achieved by cooperatively training a selector (generating an attribution map to identify important features) and a predictor. We introduce a new objective that discourages the presence of discriminative features in the masked-out regions. Our model makes accurate predictions with higher accuracy than the regular black-box model.
arXiv Detail & Related papers (2024-07-27T17:45:20Z)
TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z)
PRISM: Probabilistic Real-Time Inference in Spatial World Models [52.878769723544615]
PRISM is a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments.
arXiv Detail & Related papers (2022-12-06T13:59:06Z)
Joint Forecasting of Panoptic Segmentations with Difference Attention [72.03470153917189]
We study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene. We evaluate the proposed model on the Cityscapes and AIODrive datasets.
arXiv Detail & Related papers (2022-04-14T17:59:32Z)
Unsupervised Deep Learning Meets Chan-Vese Model [77.24463525356566]
We propose an unsupervised image segmentation approach that integrates the Chan-Vese (CV) model with deep neural networks. Our basic idea is to apply a deep neural network that maps the image into a latent space to alleviate the violation of the piecewise constant assumption in image space.
arXiv Detail & Related papers (2022-04-14T13:23:57Z)
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes [3.652509571098291]
We develop a Sequential Transformers Attention Model (STAM) that only partially observes a complete image. Our agent outperforms previous state-of-the-art by observing nearly 27% and 42% fewer pixels in glimpses on ImageNet and fMoW.
arXiv Detail & Related papers (2022-04-01T18:51:55Z)
A Gating Model for Bias Calibration in Generalized Zero-shot Learning [18.32369721322249]
Generalized zero-shot learning (GZSL) aims at training a model that can generalize to unseen class data by only using auxiliary information. One of the main challenges in GZSL is a biased model prediction toward seen classes caused by overfitting on only available seen class data during training. We propose a two-stream autoencoder-based gating model for GZSL.
arXiv Detail & Related papers (2022-03-08T16:41:06Z)
Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z)
Probabilistic Tracking with Deep Factors [8.030212474745879]
We show how to use a deep feature encoding in conjunction with generative densities over the features in a factor-graph based, probabilistic tracking framework. We present a likelihood model that combines a learned feature encoder with generative densities over them, both trained in a supervised manner.
arXiv Detail & Related papers (2021-12-02T21:31:51Z)
On Model Calibration for Long-Tailed Object Detection and Instance Segmentation [56.82077636126353]
We propose NorCal, Normalized for long-tailed object detection and instance segmentation. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance.
arXiv Detail & Related papers (2021-07-05T17:57:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.