A modular framework for object-based saccadic decisions in dynamic
scenes
- URL: http://arxiv.org/abs/2106.06073v1
- Date: Thu, 10 Jun 2021 22:28:45 GMT
- Title: A modular framework for object-based saccadic decisions in dynamic
scenes
- Authors: Nicolas Roth, Pia Bideau, Olaf Hellwich, Martin Rolfs, Klaus Obermayer
- Abstract summary: We present a new model for simulating human eye-movement behavior in dynamic real-world scenes.
We model this active scene exploration as a sequential decision making process.
For each possible choice, the model integrates evidence over time and a decision (saccadic eye movement) is triggered as soon as evidence crosses a decision threshold.
- Score: 5.7047887413125276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visually exploring the world around us is not a passive process. Instead, we
actively explore the world and acquire visual information over time. Here, we
present a new model for simulating human eye-movement behavior in dynamic
real-world scenes. We model this active scene exploration as a sequential
decision making process. We adapt the popular drift-diffusion model (DDM) for
perceptual decision making and extend it towards multiple options, defined by
objects present in the scene. For each possible choice, the model integrates
evidence over time and a decision (saccadic eye movement) is triggered as soon
as evidence crosses a decision threshold. Drawing this explicit connection
between decision making and object-based scene perception is highly relevant in
the context of active viewing, where decisions are made continuously while
interacting with an external environment. We validate our model with a
carefully designed ablation study and explore influences of our model
parameters. A comparison on the VidCom dataset supports the plausibility of the
proposed approach.
Related papers
- Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection [4.938957922033169]
Out-of-distribution (OOD) detection targets to detect and reject test samples with semantic shifts.
We propose a novel Uncertainty-Guided Appearance-Motion Association Network (UAAN)
We show that UAAN beats state-of-the-art methods by a significant margin, illustrating its effectiveness.
arXiv Detail & Related papers (2024-09-16T02:53:49Z) - A Robotics-Inspired Scanpath Model Reveals the Importance of Uncertainty and Semantic Object Cues for Gaze Guidance in Dynamic Scenes [8.64158103104882]
We present a mechanistic model that simulates object segmentation and gaze behavior for dynamic real-world scenes.
Our model uses the current scene segmentation for object-based saccadic decision-making while using the foveated object to refine its scene segmentation.
We show that our model's modular design allows for extensions, such as incorporating saccadic momentum or pre-saccadic attention.
arXiv Detail & Related papers (2024-08-02T15:20:34Z) - Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling [70.34875558830241]
We present a way for learning a-temporal (4D) embedding, based on semantic semantic gears to allow for stratified modeling of dynamic regions of rendering the scene.
At the same time, almost for free, our tracking approach enables free-viewpoint of interest - a functionality not yet achieved by existing NeRF-based methods.
arXiv Detail & Related papers (2024-06-06T03:37:39Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - D2SLAM: Semantic visual SLAM based on the influence of Depth for Dynamic
environments [0.483420384410068]
We propose a novel approach to determine dynamic elements that lack generalization and scene awareness.
We use scene depth information that refines the accuracy of estimates from geometric and semantic modules.
The obtained results demonstrate the efficacy of the proposed method in providing accurate localization and mapping in dynamic environments.
arXiv Detail & Related papers (2022-10-16T22:13:59Z) - Spatio-Temporal Relation Learning for Video Anomaly Detection [35.59510027883497]
Anomaly identification is highly dependent on the relationship between the object and the scene.
In this paper, we propose a Spatial-Temporal Relation Learning framework to tackle the video anomaly detection task.
Experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
arXiv Detail & Related papers (2022-09-27T02:19:31Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Revisiting spatio-temporal layouts for compositional action recognition [63.04778884595353]
We take an object-centric approach to action recognition.
The main focus of this paper is compositional/few-shot action recognition.
We demonstrate how to improve the performance of appearance-based models by fusion with layout-based models.
arXiv Detail & Related papers (2021-11-02T23:04:39Z) - Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model.
This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z) - Where and When: Space-Time Attention for Audio-Visual Explanations [42.093794819606444]
We propose a novel space-time attention network that uncovers the synergistic dynamics of audio and visual data over both space and time.
Our model is capable of predicting the audio-visual video events, while justifying its decision by localizing where the relevant visual cues appear.
arXiv Detail & Related papers (2021-05-04T14:16:55Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.