Learning Saliency From Fixations
- URL: http://arxiv.org/abs/2311.14073v1
- Date: Thu, 23 Nov 2023 16:04:41 GMT
- Title: Learning Saliency From Fixations
- Authors: Yasser Abdelaziz Dahou Djilali, Kevin McGuiness, Noel O'Connor
- Abstract summary: We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps.
Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.
- Score: 0.9208007322096533
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel approach for saliency prediction in images, leveraging
parallel decoding in transformers to learn saliency solely from fixation maps.
Models typically rely on continuous saliency maps, to overcome the difficulty
of optimizing for the discrete fixation map. We attempt to replicate the
experimental setup that generates saliency datasets. Our approach treats
saliency prediction as a direct set prediction problem, via a global loss that
enforces unique fixations prediction through bipartite matching and a
transformer encoder-decoder architecture. By utilizing a fixed set of learned
fixation queries, the cross-attention reasons over the image features to
directly output the fixation points, distinguishing it from other modern
saliency predictors. Our approach, named Saliency TRansformer (SalTR), achieves
metric scores on par with state-of-the-art approaches on the Salicon and MIT300
benchmarks.
Related papers
- Learning Gaussian Representation for Eye Fixation Prediction [54.88001757991433]
Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.
We introduce Gaussian Representation for eye fixation modeling.
We design our framework upon some lightweight backbones to achieve real-time fixation prediction.
arXiv Detail & Related papers (2024-03-21T20:28:22Z) - DiffusionMat: Alpha Matting as Sequential Refinement Learning [87.76572845943929]
DiffusionMat is an image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes.
A correction module adjusts the output at each denoising step, ensuring that the final result is consistent with the input image's structures.
We evaluate our model across several image matting benchmarks, and the results indicate that DiffusionMat consistently outperforms existing methods.
arXiv Detail & Related papers (2023-11-22T17:16:44Z) - Neural Jacobian Fields: Learning Intrinsic Mappings of Arbitrary Meshes [38.157373733083894]
This paper introduces a framework designed to accurately predict piecewise linear mappings of arbitrary meshes via a neural network.
The framework is based on reducing the neural aspect to a prediction of a matrix for a single point, conditioned on a global shape descriptor.
By operating in the intrinsic gradient domain of each individual mesh, it allows the framework to predict highly-accurate mappings.
arXiv Detail & Related papers (2022-05-05T19:51:13Z) - An End-to-End Transformer Model for Crowd Localization [64.15335535775883]
Crowd localization, predicting head positions, is a more practical and high-level task than simply counting.
Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions.
We propose an elegant, end-to-end Crowd Localization TRansformer that solves the task in the regression-based paradigm.
arXiv Detail & Related papers (2022-02-26T05:21:30Z) - Conditional Variational Autoencoder for Learned Image Reconstruction [5.487951901731039]
We develop a novel framework that approximates the posterior distribution of the unknown image at each query observation.
It handles implicit noise models and priors, it incorporates the data formation process (i.e., the forward operator), and the learned reconstructive properties are transferable between different datasets.
arXiv Detail & Related papers (2021-10-22T10:02:48Z) - Parameter Decoupling Strategy for Semi-supervised 3D Left Atrium
Segmentation [0.0]
We present a novel semi-supervised segmentation model based on parameter decoupling strategy to encourage consistent predictions from diverse views.
Our method has achieved a competitive result over the state-of-the-art semisupervised methods on the Atrial Challenge dataset.
arXiv Detail & Related papers (2021-09-20T14:51:42Z) - A Hierarchical Transformation-Discriminating Generative Model for Few
Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image.
The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Calibrated Adversarial Refinement for Stochastic Semantic Segmentation [5.849736173068868]
We present a strategy for learning a calibrated predictive distribution over semantic maps, where the probability associated with each prediction reflects its ground truth correctness likelihood.
We demonstrate the versatility and robustness of the approach by achieving state-of-the-art results on the multigrader LIDC dataset and on a modified Cityscapes dataset with injected ambiguities.
We show that the core design can be adapted to other tasks requiring learning a calibrated predictive distribution by experimenting on a toy regression dataset.
arXiv Detail & Related papers (2020-06-23T16:39:59Z) - Self-Supervised Tuning for Few-Shot Segmentation [82.32143982269892]
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
Existing meta-learning method tends to fail in generating category-specifically discriminative descriptor when the visual features extracted from support images are marginalized in embedding space.
This paper presents an adaptive framework tuning, in which the distribution of latent features across different episodes is dynamically adjusted based on a self-segmentation scheme.
arXiv Detail & Related papers (2020-04-12T03:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.