Patch-level Gaze Distribution Prediction for Gaze Following
- URL: http://arxiv.org/abs/2211.11062v1
- Date: Sun, 20 Nov 2022 19:25:15 GMT
- Title: Patch-level Gaze Distribution Prediction for Gaze Following
- Authors: Qiaomu Miao, Minh Hoai, Dimitris Samaras
- Abstract summary: We introduce the patch distribution prediction ( PDP) method for gaze following training.
We show that our model regularizes the MSE loss by predicting better heatmap distributions on images with larger annotation variances.
Experiments show that our model bridging the gap between the target prediction and in/out prediction subtasks, showing a significant improvement on both subtasks on public gaze following datasets.
- Score: 49.93340533068501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gaze following aims to predict where a person is looking in a scene, by
predicting the target location, or indicating that the target is located
outside the image. Recent works detect the gaze target by training a heatmap
regression task with a pixel-wise mean-square error (MSE) loss, while
formulating the in/out prediction task as a binary classification task. This
training formulation puts a strict, pixel-level constraint in higher resolution
on the single annotation available in training, and does not consider
annotation variance and the correlation between the two subtasks. To address
these issues, we introduce the patch distribution prediction (PDP) method. We
replace the in/out prediction branch in previous models with the PDP branch, by
predicting a patch-level gaze distribution that also considers the outside
cases. Experiments show that our model regularizes the MSE loss by predicting
better heatmap distributions on images with larger annotation variances,
meanwhile bridging the gap between the target prediction and in/out prediction
subtasks, showing a significant improvement in performance on both subtasks on
public gaze following datasets.
Related papers
- Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following [74.30960564603917]
Training gaze following models requires a large number of images with gaze target coordinates annotated by human annotators.
We propose the first semi-supervised method for gaze following by introducing two novel priors to the task.
Our method outperforms simple pseudo-annotation generation baselines on the GazeFollow image dataset.
arXiv Detail & Related papers (2024-06-04T20:43:26Z) - Exploiting Diffusion Prior for Generalizable Dense Prediction [85.4563592053464]
Recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate.
We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks.
Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms.
arXiv Detail & Related papers (2023-11-30T18:59:44Z) - Prediction under Latent Subgroup Shifts with High-Dimensional
Observations [30.433078066683848]
We introduce a new approach to prediction in graphical models with latent-shift adaptation.
Our novel form of RPM identifies causal latent structure in the source environment, and adapts properly to predict in the target.
arXiv Detail & Related papers (2023-06-23T12:26:24Z) - Self-Supervised Pre-training of Vision Transformers for Dense Prediction
Tasks [2.160196691362033]
We present a new self-supervised pre-training of Vision Transformers for dense prediction tasks.
Our strategy produces better local features suitable for dense prediction tasks as opposed to contrastive pre-training based on global image representation only.
arXiv Detail & Related papers (2022-05-30T15:25:37Z) - Joint Forecasting of Panoptic Segmentations with Difference Attention [72.03470153917189]
We study a new panoptic segmentation forecasting model that jointly forecasts all object instances in a scene.
We evaluate the proposed model on the Cityscapes and AIODrive datasets.
arXiv Detail & Related papers (2022-04-14T17:59:32Z) - Self-Supervision and Spatial-Sequential Attention Based Loss for
Multi-Person Pose Estimation [6.92027612631023]
Bottom-up based pose estimation approaches use heatmaps with auxiliary predictions to estimate joint positions and belonging at one time.
The lack of more explicit supervision results in low features utilization and contradictions between predictions in one model.
This paper proposes a new loss organization method which uses self-supervised heatmaps to reduce prediction contradictions and spatial-sequential attention to enhance networks' features extraction.
arXiv Detail & Related papers (2021-10-20T19:13:17Z) - CovarianceNet: Conditional Generative Model for Correct Covariance
Prediction in Human Motion Prediction [71.31516599226606]
We present a new method to correctly predict the uncertainty associated with the predicted distribution of future trajectories.
Our approach, CovariaceNet, is based on a Conditional Generative Model with Gaussian latent variables.
arXiv Detail & Related papers (2021-09-07T09:38:24Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Calibrated Adversarial Refinement for Stochastic Semantic Segmentation [5.849736173068868]
We present a strategy for learning a calibrated predictive distribution over semantic maps, where the probability associated with each prediction reflects its ground truth correctness likelihood.
We demonstrate the versatility and robustness of the approach by achieving state-of-the-art results on the multigrader LIDC dataset and on a modified Cityscapes dataset with injected ambiguities.
We show that the core design can be adapted to other tasks requiring learning a calibrated predictive distribution by experimenting on a toy regression dataset.
arXiv Detail & Related papers (2020-06-23T16:39:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.