Predicting Goal-directed Attention Control Using Inverse-Reinforcement
Learning
- URL: http://arxiv.org/abs/2001.11921v1
- Date: Fri, 31 Jan 2020 15:53:52 GMT
- Title: Predicting Goal-directed Attention Control Using Inverse-Reinforcement
Learning
- Authors: Gregory J. Zelinsky, Yupei Chen, Seoyoung Ahn, Hossein Adeli, Zhibo
Yang, Lihan Huang, Dimitrios Samaras, Minh Hoai
- Abstract summary: Using machine learning and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.
We collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO)
We used this behaviorally-annotated dataset and the machine learning method of Inverse-Reinforcement Learning (IRL) to learn target-specific reward functions and policies for these two target goals.
- Score: 25.721096184051724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how goal states control behavior is a question ripe for
interrogation by new methods from machine learning. These methods require large
and labeled datasets to train models. To annotate a large-scale image dataset
with observed search fixations, we collected 16,184 fixations from people
searching for either microwaves or clocks in a dataset of 4,366 images
(MS-COCO). We then used this behaviorally-annotated dataset and the machine
learning method of Inverse-Reinforcement Learning (IRL) to learn
target-specific reward functions and policies for these two target goals.
Finally, we used these learned policies to predict the fixations of 60 new
behavioral searchers (clock = 30, microwave = 30) in a disjoint test dataset of
kitchen scenes depicting both a microwave and a clock (thus controlling for
differences in low-level image contrast). We found that the IRL model predicted
behavioral search efficiency and fixation-density maps using multiple metrics.
Moreover, reward maps from the IRL model revealed target-specific patterns that
suggest, not just attention guidance by target features, but also guidance by
scene context (e.g., fixations along walls in the search of clocks). Using
machine learning and the psychologically-meaningful principle of reward, it is
possible to learn the visual features used in goal-directed attention control.
Related papers
- DMC-VB: A Benchmark for Representation Learning for Control with Visual Distractors [13.700885996266457]
Learning from previously collected data via behavioral cloning or offline reinforcement learning (RL) is a powerful recipe for scaling generalist agents.
We present theDeepMind Control Visual Benchmark (DMC-VB), a dataset collected in the DeepMind Control Suite to evaluate the robustness of offline RL agents.
Accompanying our dataset, we propose three benchmarks to evaluate representation learning methods for pretraining, and carry out experiments on several recently proposed methods.
arXiv Detail & Related papers (2024-09-26T23:07:01Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - SEAL: Self-supervised Embodied Active Learning using Exploration and 3D
Consistency [122.18108118190334]
We present a framework called Self- Embodied Embodied Active Learning (SEAL)
It utilizes perception models trained on internet images to learn an active exploration policy.
We and build utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner.
arXiv Detail & Related papers (2021-12-02T06:26:38Z) - Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration [47.01485765231528]
Active visual exploration aims to assist an agent with a limited field of view to understand its environment based on partial observations.
We propose the Glimpse-Attend-and-Explore model which employs self-attention to guide the visual exploration instead of task-specific uncertainty maps.
Our model provides encouraging results while being less dependent on dataset bias in driving the exploration.
arXiv Detail & Related papers (2021-08-26T11:41:03Z) - Spot What Matters: Learning Context Using Graph Convolutional Networks
for Weakly-Supervised Action Detection [0.0]
We introduce an architecture based on self-attention and Convolutional Networks to improve human action detection in video.
Our model aids explainability by visualizing the learned context as an attention map, even for actions and objects unseen during training.
Experimental results show that our contextualized approach outperforms a baseline action detection approach by more than 2 points in Video-mAP.
arXiv Detail & Related papers (2021-07-28T21:37:18Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - Model-Based Visual Planning with Self-Supervised Functional Distances [104.83979811803466]
We present a self-supervised method for model-based visual goal reaching.
Our approach learns entirely using offline, unlabeled data.
We find that this approach substantially outperforms both model-free and model-based prior methods.
arXiv Detail & Related papers (2020-12-30T23:59:09Z) - Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks.
We propose novel training methods that exploit the spatially aligned structure of remote sensing data.
Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z) - Predicting Goal-directed Human Attention Using Inverse Reinforcement
Learning [44.774961463015245]
We propose the first inverse reinforcement learning model to learn the internal reward function and policy used by humans during visual search.
To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence.
arXiv Detail & Related papers (2020-05-28T21:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.