Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver
Gaze Zone Estimation Dataset
- URL: http://arxiv.org/abs/2004.05973v4
- Date: Mon, 18 Oct 2021 04:37:58 GMT
- Title: Speak2Label: Using Domain Knowledge for Creating a Large Scale Driver
Gaze Zone Estimation Dataset
- Authors: Shreya Ghosh, Abhinav Dhall, Garima Sharma, Sarthak Gupta, Nicu Sebe
- Abstract summary: Driver Gaze in the Wild dataset contains 586 recordings, captured during different times of the day including evenings.
Driver Gaze in the Wild dataset contains 338 subjects with an age range of 18-63 years.
- Score: 55.391532084304494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Labelling of human behavior analysis data is a complex and time consuming
task. In this paper, a fully automatic technique for labelling an image based
gaze behavior dataset for driver gaze zone estimation is proposed. Domain
knowledge is added to the data recording paradigm and later labels are
generated in an automatic manner using Speech To Text conversion (STT). In
order to remove the noise in the STT process due to different illumination and
ethnicity of subjects in our data, the speech frequency and energy are
analysed. The resultant Driver Gaze in the Wild (DGW) dataset contains 586
recordings, captured during different times of the day including evenings. The
large scale dataset contains 338 subjects with an age range of 18-63 years. As
the data is recorded in different lighting conditions, an illumination robust
layer is proposed in the Convolutional Neural Network (CNN). The extensive
experiments show the variance in the dataset resembling real-world conditions
and the effectiveness of the proposed CNN pipeline. The proposed network is
also fine-tuned for the eye gaze prediction task, which shows the
discriminativeness of the representation learnt by our network on the proposed
DGW dataset. Project Page:
https://sites.google.com/view/drivergazeprediction/home
Related papers
- Forest Inspection Dataset for Aerial Semantic Segmentation and Depth
Estimation [6.635604919499181]
We introduce a new large aerial dataset for forest inspection.
It contains both real-world and virtual recordings of natural environments.
We develop a framework to assess the deforestation degree of an area.
arXiv Detail & Related papers (2024-03-11T11:26:44Z) - VALERIE22 -- A photorealistic, richly metadata annotated dataset of
urban environments [5.439020425819001]
The VALERIE tool pipeline is a synthetic data generator developed to contribute to the understanding of domain-specific factors.
The VALERIE22 dataset was generated with the VALERIE procedural tools pipeline providing a photorealistic sensor simulation.
The dataset provides a uniquely rich set of metadata, allowing extraction of specific scene and semantic features.
arXiv Detail & Related papers (2023-08-18T15:44:45Z) - STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional
Settings [2.580765958706854]
STREAMLINE is a novel streaming active learning framework that mitigates scenario-driven slice imbalance in working labeled data.
We evaluate STREAMLINE on real-world streaming scenarios for image classification and object detection tasks.
arXiv Detail & Related papers (2023-05-18T02:01:45Z) - Change Detection from Synthetic Aperture Radar Images via Graph-Based
Knowledge Supplement Network [36.41983596642354]
We propose a Graph-based Knowledge Supplement Network (GKSNet) for image change detection.
To be more specific, we extract discriminative information from the existing labeled dataset as additional knowledge.
To validate the proposed method, we conducted extensive experiments on four SAR datasets.
arXiv Detail & Related papers (2022-01-22T02:50:50Z) - Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision
Datasets from 3D Scans [103.92680099373567]
This paper introduces a pipeline to parametrically sample and render multi-task vision datasets from comprehensive 3D scans from the real world.
Changing the sampling parameters allows one to "steer" the generated datasets to emphasize specific information.
Common architectures trained on a generated starter dataset reached state-of-the-art performance on multiple common vision tasks and benchmarks.
arXiv Detail & Related papers (2021-10-11T04:21:46Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head
Pose and Gaze Variation [52.5465548207648]
ETH-XGaze is a new gaze estimation dataset consisting of over one million high-resolution images of varying gaze under extreme head poses.
We show that our dataset can significantly improve the robustness of gaze estimation methods across different head poses and gaze angles.
arXiv Detail & Related papers (2020-07-31T04:15:53Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z) - Learning-Based Human Segmentation and Velocity Estimation Using
Automatic Labeled LiDAR Sequence for Training [15.19884183320726]
We propose an automatic labeled sequential data generation pipeline for human recognition with point clouds.
Our approach uses a precise human model and reproduces a precise motion to generate realistic artificial data.
arXiv Detail & Related papers (2020-03-11T03:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.