Learning Gaussian Representation for Eye Fixation Prediction
- URL: http://arxiv.org/abs/2403.14821v1
- Date: Thu, 21 Mar 2024 20:28:22 GMT
- Title: Learning Gaussian Representation for Eye Fixation Prediction
- Authors: Peipei Song, Jing Zhang, Piotr Koniusz, Nick Barnes,
- Abstract summary: Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.
We introduce Gaussian Representation for eye fixation modeling.
We design our framework upon some lightweight backbones to achieve real-time fixation prediction.
- Score: 54.88001757991433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points. However, due to the stochastic nature of human fixation, the generated dense fixation maps may be a less-than-ideal representation of human fixation. To provide a robust fixation model, we introduce Gaussian Representation for eye fixation modeling. Specifically, we propose to model the eye fixation map as a mixture of probability distributions, namely a Gaussian Mixture Model. In this new representation, we use several Gaussian distribution components as an alternative to the provided fixation map, which makes the model more robust to the randomness of fixation. Meanwhile, we design our framework upon some lightweight backbones to achieve real-time fixation prediction. Experimental results on three public fixation prediction datasets (SALICON, MIT1003, TORONTO) demonstrate that our method is fast and effective.
Related papers
- CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting [68.94594215660473]
We propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS)
We exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms.
Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality.
arXiv Detail & Related papers (2024-04-15T04:50:39Z) - pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction [26.72289913260324]
pixelSplat is a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images.
Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time.
arXiv Detail & Related papers (2023-12-19T17:03:50Z) - Learning Saliency From Fixations [0.9208007322096533]
We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps.
Our approach, named Saliency TRansformer (SalTR), achieves metric scores on par with state-of-the-art approaches on the Salicon and MIT300 benchmarks.
arXiv Detail & Related papers (2023-11-23T16:04:41Z) - Diffusion with Forward Models: Solving Stochastic Inverse Problems
Without Direct Supervision [76.32860119056964]
We propose a novel class of denoising diffusion probabilistic models that learn to sample from distributions of signals that are never directly observed.
We demonstrate the effectiveness of our method on three challenging computer vision tasks.
arXiv Detail & Related papers (2023-06-20T17:53:00Z) - DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image.
We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z) - The Best of Both Worlds: Combining Model-based and Nonparametric
Approaches for 3D Human Body Estimation [20.797162096899154]
We propose a framework for estimating model parameters from global image features.
A dense map prediction module explicitly establishes the dense UV correspondence between the image evidence and each part of the body model.
In inverse kinematics module refines the key point prediction and generates a posed template mesh.
A UV inpainting module relies on the corresponding feature, prediction and the posed template, and completes the predictions of occluded body shape.
arXiv Detail & Related papers (2022-05-01T16:39:09Z) - A Model for Multi-View Residual Covariances based on Perspective
Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups.
We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z) - PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences.
We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction.
Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - AE-OT-GAN: Training GANs from data specific latent distribution [21.48007565143911]
generative adversarial networks (GANs) areprominent models to generate realistic and crisp images.
GANs often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs.
The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map.
In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at the same time overcome the mode collapse/mixture problems.
arXiv Detail & Related papers (2020-01-11T01:18:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.