Related papers: Predicting Visual Attention and Distraction During Visual Search Using Convolutional Neural Networks

Predicting Visual Attention and Distraction During Visual Search Using Convolutional Neural Networks

URL: http://arxiv.org/abs/2210.15093v1
Date: Thu, 27 Oct 2022 00:39:43 GMT
Title: Predicting Visual Attention and Distraction During Visual Search Using Convolutional Neural Networks
Authors: Manoosh Samiei, James J. Clark
Abstract summary: We present two approaches to model visual attention and distraction of observers during visual search. Our first approach adapts a light-weight free-viewing saliency model to predict eye fixation density maps of human observers over pixels of search images. Our second approach is object-based and predicts the distractor and target objects during visual search.
Score: 2.7920304852537527
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most studies in computational modeling of visual attention encompass task-free observation of images. Free-viewing saliency considers limited scenarios of daily life. Most visual activities are goal-oriented and demand a great amount of top-down attention control. Visual search task demands more top-down control of attention, compared to free-viewing. In this paper, we present two approaches to model visual attention and distraction of observers during visual search. Our first approach adapts a light-weight free-viewing saliency model to predict eye fixation density maps of human observers over pixels of search images, using a two-stream convolutional encoder-decoder network, trained and evaluated on COCO-Search18 dataset. This method predicts which locations are more distracting when searching for a particular target. Our network achieves good results on standard saliency metrics (AUC-Judd=0.95, AUC-Borji=0.85, sAUC=0.84, NSS=4.64, KLD=0.93, CC=0.72, SIM=0.54, and IG=2.59). Our second approach is object-based and predicts the distractor and target objects during visual search. Distractors are all objects except the target that observers fixate on during search. This method uses a Mask-RCNN segmentation network pre-trained on MS-COCO and fine-tuned on COCO-Search18 dataset. We release our segmentation annotations of targets and distractors in COCO-Search18 for three target categories: bottle, bowl, and car. The average scores over the three categories are: F1-score=0.64, MAP(iou:0.5)=0.57, MAR(iou:0.5)=0.73. Our implementation code in Tensorflow is publicly available at https://github.com/ManooshSamiei/Distraction-Visual-Search .

Related papers

LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes. We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net) The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z)
OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction [0.2796197251957245]
This paper introduces the Object-level Attention Transformer (OAT) OAT predicts human scanpaths as they search for a target object within a cluttered scene of distractors. We evaluate OAT on the Amazon book cover dataset and a new dataset for visual search that we collected.
arXiv Detail & Related papers (2024-07-18T09:33:17Z)
Target Features Affect Visual Search, A Study of Eye Fixations [2.7920304852537527]
We investigate how the performance of human participants during visual search is affected by different parameters. Our studies show that a bigger and more eccentric target is found faster with fewer number of fixations.
arXiv Detail & Related papers (2022-09-28T01:53:16Z)
Target-absent Human Attention [44.10971508325032]
We propose the first data-driven computational model that addresses the search-termination problem. We represent the internal knowledge that the viewer acquires through fixations using a novel state representation. We improve the state of the art in predicting human target-absent search behavior on the COCO-Search18 dataset.
arXiv Detail & Related papers (2022-07-04T02:32:04Z)
DetCo: Unsupervised Contrastive Learning for Object Detection [64.22416613061888]
Unsupervised contrastive learning achieves great success in learning image representations with CNN. We present a novel contrastive learning approach, named DetCo, which fully explores the contrasts between global image and local image patches. DetCo consistently outperforms supervised method by 1.6/1.2/1.0 AP on Mask RCNN-C4/FPN/RetinaNet with 1x schedule.
arXiv Detail & Related papers (2021-02-09T12:47:20Z)
Graph Attention Tracking [76.19829750144564]
We propose a simple target-aware Siamese graph attention network for general object tracking. Experiments on challenging benchmarks including GOT-10k, UAV123, OTB-100 and LaSOT demonstrate that the proposed SiamGAT outperforms many state-of-the-art trackers.
arXiv Detail & Related papers (2020-11-23T04:26:45Z)
Utilising Visual Attention Cues for Vehicle Detection and Tracking [13.2351348789193]
We explore possible ways to use visual attention (saliency) for object detection and tracking. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. The experiments are conducted on KITTI and DETRAC datasets.
arXiv Detail & Related papers (2020-07-31T23:00:13Z)
A Self-Training Approach for Point-Supervised Object Detection and Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations. During training, we utilize the available point annotations to supervise the estimation of the center points of objects. Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z)
AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification [86.64702967379709]
We propose a novel search space fortemporal attention cells, which allows the search algorithm to flexibly explore various design choices in the cell. The discovered attention cells can be seamlessly inserted into existing backbone networks, e.g., I3D or S3D, and improve video accuracy by more than 2% on both Kinetics-600 and MiT datasets.
arXiv Detail & Related papers (2020-07-23T14:30:05Z)
Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning [44.774961463015245]
We propose the first inverse reinforcement learning model to learn the internal reward function and policy used by humans during visual search. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence.
arXiv Detail & Related papers (2020-05-28T21:46:27Z)
Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner. We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.