Active Gaze Control for Foveal Scene Exploration
- URL: http://arxiv.org/abs/2208.11594v1
- Date: Wed, 24 Aug 2022 14:59:28 GMT
- Title: Active Gaze Control for Foveal Scene Exploration
- Authors: Alexandre M.F. Dias, Lu\'is Sim\~oes, Plinio Moreno, Alexandre
Bernardino
- Abstract summary: We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
- Score: 124.11737060344052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active perception and foveal vision are the foundations of the human visual
system. While foveal vision reduces the amount of information to process during
a gaze fixation, active perception will change the gaze direction to the most
promising parts of the visual field. We propose a methodology to emulate how
humans and robots with foveal cameras would explore a scene, identifying the
objects present in their surroundings with in least number of gaze shifts. Our
approach is based on three key methods. First, we take an off-the-shelf deep
object detector, pre-trained on a large dataset of regular images, and
calibrate the classification outputs to the case of foveated images. Second, a
body-centered semantic map, encoding the objects classifications and
corresponding uncertainties, is sequentially updated with the calibrated
detections, considering several data fusion techniques. Third, the next best
gaze fixation point is determined based on information-theoretic metrics that
aim at minimizing the overall expected uncertainty of the semantic map. When
compared to the random selection of next gaze shifts, the proposed method
achieves an increase in detection F1-score of 2-3 percentage points for the
same number of gaze shifts and reduces to one third the number of required gaze
shifts to attain similar performance.
Related papers
- View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics.
The proposed method addresses limitations in existing cross-view localization methods.
It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z) - Learning to search for and detect objects in foveal images using deep
learning [3.655021726150368]
This study employs a fixation prediction model that emulates human objective-guided attention of searching for a given class in an image.
The foveated pictures at each fixation point are then classified to determine whether the target is present or absent in the scene.
We present a novel dual task model capable of performing fixation prediction and detection simultaneously, allowing knowledge transfer between the two tasks.
arXiv Detail & Related papers (2023-04-12T09:50:25Z) - Self-Calibrating Anomaly and Change Detection for Autonomous Inspection
Robots [0.07366405857677225]
A visual anomaly or change detection algorithm identifies regions of an image that differ from a reference image or dataset.
We propose a comprehensive deep learning framework for detecting anomalies and changes in a priori unknown environments.
arXiv Detail & Related papers (2022-08-26T09:52:12Z) - Improving saliency models' predictions of the next fixation with humans'
intrinsic cost of gaze shifts [6.315366433343492]
We develop a principled framework for predicting the next gaze target and the empirical measurement of the human cost for gaze.
We provide an implementation of human gaze preferences, which can be used to improve arbitrary saliency models' predictions of humans' next gaze targets.
arXiv Detail & Related papers (2022-07-09T11:21:13Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - City-scale Scene Change Detection using Point Clouds [71.73273007900717]
We propose a method for detecting structural changes in a city using images captured from mounted cameras over two different times.
A direct comparison of the two point clouds for change detection is not ideal due to inaccurate geo-location information.
To circumvent this problem, we propose a deep learning-based non-rigid registration on the point clouds.
Experiments show that our method is able to detect scene changes effectively, even in the presence of viewpoint and illumination differences.
arXiv Detail & Related papers (2021-03-26T08:04:13Z) - LNSMM: Eye Gaze Estimation With Local Network Share Multiview Multitask [7.065909514483728]
We propose a novel methodology to estimate eye gaze points and eye gaze directions simultaneously.
The experiment show our method is state-of-the-art the current mainstream methods on two indicators of gaze points and gaze directions.
arXiv Detail & Related papers (2021-01-18T15:14:24Z) - Boosting Image-based Mutual Gaze Detection using Pseudo 3D Gaze [19.10872208787867]
Mutual gaze detection plays an important role in understanding human interactions.
We propose a simple and effective approach to boost the performance by using an auxiliary 3D gaze estimation task during the training phase.
We achieve the performance boost without additional labeling cost by training the 3D gaze estimation branch using pseudo 3D gaze labels deduced from mutual gaze labels.
arXiv Detail & Related papers (2020-10-15T15:01:41Z) - A Self-Training Approach for Point-Supervised Object Detection and
Counting in Crowds [54.73161039445703]
We propose a novel self-training approach that enables a typical object detector trained only with point-level annotations.
During training, we utilize the available point annotations to supervise the estimation of the center points of objects.
Experimental results show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks.
arXiv Detail & Related papers (2020-07-25T02:14:42Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.