DeeSCo: Deep heterogeneous ensemble with Stochastic Combinatory loss for
gaze estimation
- URL: http://arxiv.org/abs/2004.07098v1
- Date: Wed, 15 Apr 2020 14:06:31 GMT
- Title: DeeSCo: Deep heterogeneous ensemble with Stochastic Combinatory loss for
gaze estimation
- Authors: Edouard Yvinec, Arnaud Dapogny, and K\'evin Bailly
- Abstract summary: We introduce a deep, end-to-end trainable ensemble of heatmap-based weak predictors for 2D/3D gaze estimation.
We show that our ensemble outperforms state-of-the-art approaches for 2D/3D gaze estimation on multiple datasets.
- Score: 7.09232719022402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: From medical research to gaming applications, gaze estimation is becoming a
valuable tool. While there exists a number of hardware-based solutions, recent
deep learning-based approaches, coupled with the availability of large-scale
databases, have allowed to provide a precise gaze estimate using only consumer
sensors. However, there remains a number of questions, regarding the problem
formulation, architectural choices and learning paradigms for designing gaze
estimation systems in order to bridge the gap between geometry-based systems
involving specific hardware and approaches using consumer sensors only. In this
paper, we introduce a deep, end-to-end trainable ensemble of heatmap-based weak
predictors for 2D/3D gaze estimation. We show that, through heterogeneous
architectural design of these weak predictors, we can improve the decorrelation
between the latter predictors to design more robust deep ensemble models.
Furthermore, we propose a stochastic combinatory loss that consists in randomly
sampling combinations of weak predictors at train time. This allows to train
better individual weak predictors, with lower correlation between them. This,
in turns, allows to significantly enhance the performance of the deep ensemble.
We show that our Deep heterogeneous ensemble with Stochastic Combinatory loss
(DeeSCo) outperforms state-of-the-art approaches for 2D/3D gaze estimation on
multiple datasets.
Related papers
- PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.
Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Robust Two-View Geometry Estimation with Implicit Differentiation [2.048226951354646]
We present a novel two-view geometry estimation framework.
It is based on a differentiable robust loss function fitting.
We evaluate our approach on the camera pose estimation task in both outdoor and indoor scenarios.
arXiv Detail & Related papers (2024-10-23T15:51:33Z) - Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding [15.834419910916933]
We propose a novel positional encoding technique, PerturbPE, that extracts consistent and regular components from the eigenbasis.
Our results support our theoretical findings, e.g. our experimental analysis observed a performance enhancement of up to $12%$ on the Human3.6M dataset.
Our novel approach significantly enhances performance in scenarios where two edges are missing, setting a new benchmark for state-of-the-art.
arXiv Detail & Related papers (2024-05-27T17:48:54Z) - Depth-agnostic Single Image Dehazing [12.51359372069387]
We propose a simple yet novel synthetic method to decouple the relationship between haze density and scene depth, by which a depth-agnostic dataset (DA-HAZE) is generated.
Experiments indicate that models trained on DA-HAZE achieve significant improvements on real-world benchmarks, with less discrepancy between SOTS and DA-SOTS.
We revisit the U-Net-based architectures for dehazing, in which dedicatedly designed blocks are incorporated.
arXiv Detail & Related papers (2024-01-14T06:33:11Z) - Match and Locate: low-frequency monocular odometry based on deep feature
matching [0.65268245109828]
We introduce a novel approach for the robotic odometry which only requires a single camera.
The approach is based on matching image features between the consecutive frames of the video stream using deep feature matching models.
We evaluate the performance of the approach in the AISG-SLA Visual Localisation Challenge and find that while being computationally efficient and easy to implement our method shows competitive results.
arXiv Detail & Related papers (2023-11-16T17:32:58Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium [11.78276690882616]
Self-supervised multi-frame depth estimation achieves high accuracy by computing matching costs of pixel correspondences between adjacent frames.
We propose the Dual model, which tightly couples depth and pose estimation through a feedback loop.
Our novel update pipeline uses a deep equilibrium model framework to iteratively refine depth estimates and a hidden state of feature maps.
arXiv Detail & Related papers (2023-04-07T09:46:29Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences.
We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction.
Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.