Binding Dancers Into Attractors
- URL: http://arxiv.org/abs/2206.02558v1
- Date: Wed, 1 Jun 2022 22:01:29 GMT
- Title: Binding Dancers Into Attractors
- Authors: Franziska Kaltenberger, Sebastian Otte, Martin V. Butz
- Abstract summary: Feature binding and perspective taking are crucial cognitive abilities.
We propose a recurrent neural network model that solves both challenges.
We first train an LSTM to predict 3D motion dynamics from a canonical perspective.
We then present similar motion dynamics with novel viewpoints and feature arrangements.
- Score: 0.5801044612920815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To effectively perceive and process observations in our environment, feature
binding and perspective taking are crucial cognitive abilities. Feature binding
combines observed features into one entity, called a Gestalt. Perspective
taking transfers the percept into a canonical, observer-centered frame of
reference. Here we propose a recurrent neural network model that solves both
challenges. We first train an LSTM to predict 3D motion dynamics from a
canonical perspective. We then present similar motion dynamics with novel
viewpoints and feature arrangements. Retrospective inference enables the
deduction of the canonical perspective. Combined with a robust mutual-exclusive
softmax selection scheme, random feature arrangements are reordered and
precisely bound into known Gestalt percepts. To corroborate evidence for the
architecture's cognitive validity, we examine its behavior on the silhouette
illusion, which elicits two competitive Gestalt interpretations of a rotating
dancer. Our system flexibly binds the information of the rotating figure into
the alternative attractors resolving the illusion's ambiguity and imagining the
respective depth interpretation and the corresponding direction of rotation. We
finally discuss the potential universality of the proposed mechanisms.
Related papers
- Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction [60.964512894143475]
We present Generative Spatial Transformer ( GST), a novel auto-regressive framework that jointly addresses spatial localization and view prediction.
Our model simultaneously estimates the camera pose from a single image and predicts the view from a new camera pose, effectively bridging the gap between spatial awareness and visual prediction.
arXiv Detail & Related papers (2024-10-24T17:58:05Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
We introduce Artificial Kuramotoy Neurons (AKOrN) as a dynamical alternative to threshold units.
We show that this idea provides performance improvements across a wide spectrum of tasks.
We believe that these empirical results show the importance of our assumptions at the most basic neuronal level of neural representation.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - Spherical World-Locking for Audio-Visual Localization in Egocentric Videos [53.658928180166534]
We propose Spherical World-Locking as a general framework for egocentric scene representation.
Compared to conventional head-locked egocentric representations with a 2D planar field-of-view, SWL effectively offsets challenges posed by self-motion.
We design a unified encoder-decoder transformer architecture that preserves the spherical structure of the scene representation.
arXiv Detail & Related papers (2024-08-09T22:29:04Z) - Neural Concept Binder [22.074896812195437]
We introduce the Neural Concept Binder (NCB), a framework for deriving both discrete and continuous concept representations.
The structured nature of NCB's concept representations allows for intuitive inspection and the straightforward integration of external knowledge.
We validate the effectiveness of NCB through evaluations on our newly introduced CLEVR-Sudoku dataset.
arXiv Detail & Related papers (2024-06-14T11:52:09Z) - Binding Dynamics in Rotating Features [72.80071820194273]
We propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly.
This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
arXiv Detail & Related papers (2024-02-08T12:31:08Z) - Computing a human-like reaction time metric from stable recurrent vision
models [11.87006916768365]
We sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model.
We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks.
This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks.
arXiv Detail & Related papers (2023-06-20T14:56:02Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z) - Fully Steerable 3D Spherical Neurons [14.86655504533083]
We propose a steerable feed-forward learning-based approach that consists of spherical decision surfaces and operates on point clouds.
Due to the inherent geometric 3D structure of our theory, we derive a 3D steerability constraint for its atomic parts.
We show how the model parameters are fully steerable at inference time.
arXiv Detail & Related papers (2021-06-02T16:30:02Z) - Binding and Perspective Taking as Inference in a Generative Neural
Network Model [1.0323063834827415]
generative encoder-decoder architecture adapts its perspective and binds features by means of retrospective inference.
We show that the resulting gradient-based inference process solves the perspective taking and binding problem for known biological motion patterns.
arXiv Detail & Related papers (2020-12-09T16:43:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.