GOO: A Dataset for Gaze Object Prediction in Retail Environments
- URL: http://arxiv.org/abs/2105.10793v1
- Date: Sat, 22 May 2021 18:55:35 GMT
- Title: GOO: A Dataset for Gaze Object Prediction in Retail Environments
- Authors: Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando,
Joel Casimiro, Rowel Atienza, Richard Guinto
- Abstract summary: We present a new task called gaze object prediction.
The goal is to predict a bounding box for a person's gazed-at object.
To train and evaluate gaze networks on this task, we present the Gaze On Objects dataset.
- Score: 11.280648029091537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most fundamental and information-laden actions humans do is to
look at objects. However, a survey of current works reveals that existing
gaze-related datasets annotate only the pixel being looked at, and not the
boundaries of a specific object of interest. This lack of object annotation
presents an opportunity for further advancing gaze estimation research. To this
end, we present a challenging new task called gaze object prediction, where the
goal is to predict a bounding box for a person's gazed-at object. To train and
evaluate gaze networks on this task, we present the Gaze On Objects (GOO)
dataset. GOO is composed of a large set of synthetic images (GOO Synth)
supplemented by a smaller subset of real images (GOO-Real) of people looking at
objects in a retail environment. Our work establishes extensive baselines on
GOO by re-implementing and evaluating selected state-of-the art models on the
task of gaze following and domain adaptation. Code is available on github.
Related papers
- In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation [50.79940712523551]
We present lazy visual grounding, a two-stage approach of unsupervised object mask discovery followed by object grounding.
Our model requires no additional training yet shows great performance on five public datasets.
arXiv Detail & Related papers (2024-08-09T09:28:35Z) - Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model [19.800353299691277]
This paper presents a more challenging gaze object segmentation (GOS) task, which involves inferring the pixel-level mask corresponding to the object captured by human gaze behavior.
We propose to automatically obtain head features from scene features to ensure the model's inference efficiency and flexibility in the real world.
arXiv Detail & Related papers (2024-08-02T06:32:45Z) - OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction [0.2796197251957245]
This paper introduces the Object-level Attention Transformer (OAT)
OAT predicts human scanpaths as they search for a target object within a cluttered scene of distractors.
We evaluate OAT on the Amazon book cover dataset and a new dataset for visual search that we collected.
arXiv Detail & Related papers (2024-07-18T09:33:17Z) - TransGOP: Transformer-Based Gaze Object Prediction [27.178785186892203]
This paper introduces Transformer into the fields of gaze object prediction.
It proposes an end-to-end Transformer-based gaze object prediction method named TransGOP.
arXiv Detail & Related papers (2024-02-21T07:17:10Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Object Detection in Aerial Images with Uncertainty-Aware Graph Network [61.02591506040606]
We propose a novel uncertainty-aware object detection framework with a structured-graph, where nodes and edges are denoted by objects.
We refer to our model as Uncertainty-Aware Graph network for object DETection (UAGDet)
arXiv Detail & Related papers (2022-08-23T07:29:03Z) - Automatic dataset generation for specific object detection [6.346581421948067]
We present a method to synthesize object-in-scene images, which can preserve the objects' detailed features without bringing irrelevant information.
Our result shows that in the synthesized image, the boundaries of objects blend very well with the background.
arXiv Detail & Related papers (2022-07-16T07:44:33Z) - GaTector: A Unified Framework for Gaze Object Prediction [11.456242421204298]
We build a novel framework named GaTector to tackle the gaze object prediction problem in a unified way.
To better consider the specificity of inputs and tasks, GaTector introduces two input-specific blocks before the shared backbone and three task-specific blocks after the shared backbone.
In the end, we propose a novel wUoC metric that can reveal the difference between boxes even when they share no overlapping area.
arXiv Detail & Related papers (2021-12-07T07:50:03Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots.
Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step.
This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z) - Personal Fixations-Based Object Segmentation with Object Localization
and Boundary Preservation [60.41628937597989]
We focus on Personal Fixations-based Object (PFOS) to address issues in previous studies.
We propose a novel network based on Object Localization and Boundary Preservation (OLBP) to segment the gazed objects.
OLBP is organized in the mixed bottom-up and top-down manner with multiple types of deep supervision.
arXiv Detail & Related papers (2021-01-22T09:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.