Related papers: SALYPATH: A Deep-Based Architecture for visual attention prediction

SALYPATH: A Deep-Based Architecture for visual attention prediction

URL: http://arxiv.org/abs/2107.00559v1
Date: Tue, 29 Jun 2021 08:53:51 GMT
Title: SALYPATH: A Deep-Based Architecture for visual attention prediction
Authors: Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani, Rachid Harba
Abstract summary: Visual attention is useful for many computer vision applications such as image compression, recognition, and captioning. We propose an end-to-end deep-based method, so-called SALYPATH, that efficiently predicts the scanpath of an image through features of a saliency model. The idea is predict the scanpath by exploiting the capacity of a deep-based model to predict the saliency.
Score: 5.068678962285629
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Human vision is naturally more attracted by some regions within their field of view than others. This intrinsic selectivity mechanism, so-called visual attention, is influenced by both high- and low-level factors; such as the global environment (illumination, background texture, etc.), stimulus characteristics (color, intensity, orientation, etc.), and some prior visual information. Visual attention is useful for many computer vision applications such as image compression, recognition, and captioning. In this paper, we propose an end-to-end deep-based method, so-called SALYPATH (SALiencY and scanPATH), that efficiently predicts the scanpath of an image through features of a saliency model. The idea is predict the scanpath by exploiting the capacity of a deep-based model to predict the saliency. The proposed method was evaluated through 2 well-known datasets. The results obtained showed the relevance of the proposed framework comparing to state-of-the-art models.

Related papers

Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention [49.99728312519117]
SemBA-FAST is a top-down framework designed for predicting human visual attention in target-present visual search.<n>We evaluate SemBA-FAST on the COCO-Search18 benchmark dataset, comparing its performance against other scanpath prediction models.<n>These findings provide valuable insights into the capabilities of semantic-foveal probabilistic frameworks for human-like attention modelling.
arXiv Detail & Related papers (2025-07-24T15:19:23Z)
Low-Light Enhancement Effect on Classification and Detection: An Empirical Study [48.6762437869172]
We evaluate the impact of Low-Light Image Enhancement (LLIE) methods on high-level vision tasks. Our findings suggest a disconnect between image enhancement for human visual perception and for machine analysis. This insight is crucial for the development of LLIE techniques that align with the needs of both human and machine vision.
arXiv Detail & Related papers (2024-09-22T14:21:31Z)
pAE: An Efficient Autoencoder Architecture for Modeling the Lateral Geniculate Nucleus by Integrating Feedforward and Feedback Streams in Human Visual System [0.716879432974126]
We introduce a deep convolutional model that closely approximates human visual information processing. We aim to approximate the function for the lateral geniculate nucleus (LGN) area using a trained shallow convolutional model. The pAE model achieves the final 99.26% prediction performance and demonstrates a notable improvement of around 28% over human results in the temporal mode.
arXiv Detail & Related papers (2024-09-20T16:33:01Z)
Data Augmentation via Latent Diffusion for Saliency Prediction [67.88936624546076]
Saliency prediction models are constrained by the limited diversity and quantity of labeled data. We propose a novel data augmentation method for deep saliency prediction that edits natural images while preserving the complexity and variability of real-world scenes.
arXiv Detail & Related papers (2024-09-11T14:36:24Z)
Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images. Our model learns to iteratively attend to regions of the image relevant for classification. We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z)
A domain adaptive deep learning solution for scanpath prediction of paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention [40.878963450471026]
We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner. We show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths.
arXiv Detail & Related papers (2022-04-19T18:57:47Z)
PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem. Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework. The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z)
What Image Features Boost Housing Market Predictions? [81.32205133298254]
We propose a set of techniques for the extraction of visual features for efficient numerical inclusion in predictive algorithms. We discuss techniques such as Shannon's entropy, calculating the center of gravity, employing image segmentation, and using Convolutional Neural Networks. The set of 40 image features selected here carries a significant amount of predictive power and outperforms some of the strongest metadata predictors.
arXiv Detail & Related papers (2021-07-15T06:32:10Z)
A Psychophysically Oriented Saliency Map Prediction Model [4.884688557957589]
We propose a new psychophysical saliency prediction architecture, WECSF, inspired by multi-channel model of visual cortex functioning in humans. The proposed model is evaluated using several datasets, including the MIT1003, MIT300, Toronto, SID4VAM, and UCF Sports datasets. Our model achieved strongly stable and better performance with different metrics on natural images, psychophysical synthetic images and dynamic videos.
arXiv Detail & Related papers (2020-11-08T20:58:05Z)
Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. We combine the resulting representations with global scene information for accurately predicting visual saliency. Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.