Modeling State Shifting via Local-Global Distillation for Event-Frame Gaze Tracking
- URL: http://arxiv.org/abs/2404.00548v2
- Date: Fri, 28 Jun 2024 14:13:18 GMT
- Title: Modeling State Shifting via Local-Global Distillation for Event-Frame Gaze Tracking
- Authors: Jiading Li, Zhiyu Zhu, Jinhui Hou, Junhui Hou, Jinjian Wu,
- Abstract summary: This paper tackles the problem of passive gaze estimation using both event and frame data.
We reformulate gaze estimation as the quantification of the state shifting from the current state to several prior registered anchor states.
To improve the generalization ability, instead of learning a large gaze estimation network directly, we align a group of local experts with a student network.
- Score: 61.44701715285463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the problem of passive gaze estimation using both event and frame data. Considering the inherently different physiological structures, it is intractable to accurately estimate gaze purely based on a given state. Thus, we reformulate gaze estimation as the quantification of the state shifting from the current state to several prior registered anchor states. Specifically, we propose a two-stage learning-based gaze estimation framework that divides the whole gaze estimation process into a coarse-to-fine approach involving anchor state selection and final gaze location. Moreover, to improve the generalization ability, instead of learning a large gaze estimation network directly, we align a group of local experts with a student network, where a novel denoising distillation algorithm is introduced to utilize denoising diffusion techniques to iteratively remove inherent noise in event data. Extensive experiments demonstrate the effectiveness of the proposed method, which surpasses state-of-the-art methods by a large margin of 15$\%$. The code will be publicly available at https://github.com/jdjdli/Denoise_distill_EF_gazetracker.
Related papers
- Multistep Inverse Is Not All You Need [87.62730694973696]
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise.
It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables.
We propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model.
arXiv Detail & Related papers (2024-03-18T16:36:01Z) - DiffSF: Diffusion Models for Scene Flow Estimation [17.512660491303684]
We propose DiffSF that combines transformer-based scene flow estimation with denoising diffusion models.
We show that the diffusion process greatly increases the robustness of predictions compared to prior approaches.
By sampling multiple times with different initial states, the denoising process predicts multiple hypotheses, which enables measuring the output uncertainty.
arXiv Detail & Related papers (2024-03-08T14:06:15Z) - Observation-Guided Diffusion Probabilistic Models [41.749374023639156]
We propose a novel diffusion-based image generation method called the observation-guided diffusion probabilistic model (OGDM)
Our approach reestablishes the training objective by integrating the guidance of the observation process with the Markov chain.
We demonstrate the effectiveness of our training algorithm using diverse inference techniques on strong diffusion model baselines.
arXiv Detail & Related papers (2023-10-06T06:29:06Z) - Score-based Data Assimilation [7.215767098253208]
We introduce score-based data assimilation for trajectory inference.
We learn a score-based generative model of state trajectories based on the key insight that the score of an arbitrarily long trajectory can be decomposed into a series of scores over short segments.
arXiv Detail & Related papers (2023-06-18T14:22:03Z) - LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic
Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics.
By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Appearance-based Gaze Estimation With Deep Learning: A Review and Benchmark [14.306488668615883]
We present a systematic review of the appearance-based gaze estimation methods using deep learning.
We summarize the data pre-processing and post-processing methods, including face/eye detection, data rectification, 2D/3D gaze conversion and gaze origin conversion.
arXiv Detail & Related papers (2021-04-26T15:53:03Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features.
At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features.
At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.