Capturing the objects of vision with neural networks
- URL: http://arxiv.org/abs/2109.03351v1
- Date: Tue, 7 Sep 2021 21:49:53 GMT
- Title: Capturing the objects of vision with neural networks
- Authors: Benjamin Peters, Nikolaus Kriegeskorte
- Abstract summary: Human visual perception carves a scene at its physical joints, decomposing the world into objects.
Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input.
We review related work in both fields and examine how these fields can help each other.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human visual perception carves a scene at its physical joints, decomposing
the world into objects, which are selectively attended, tracked, and predicted
as we engage our surroundings. Object representations emancipate perception
from the sensory input, enabling us to keep in mind that which is out of sight
and to use perceptual content as a basis for action and symbolic cognition.
Human behavioral studies have documented how object representations emerge
through grouping, amodal completion, proto-objects, and object files. Deep
neural network (DNN) models of visual object recognition, by contrast, remain
largely tethered to the sensory input, despite achieving human-level
performance at labeling objects. Here, we review related work in both fields
and examine how these fields can help each other. The cognitive literature
provides a starting point for the development of new experimental tasks that
reveal mechanisms of human object perception and serve as benchmarks driving
development of deep neural network models that will put the object into object
recognition.
Related papers
- Learning 3D object-centric representation through prediction [12.008668555280668]
We develop a novel network architecture that learns to 1) segment objects from discrete images, 2) infer their 3D locations, and 3) perceive depth.
The core idea is treating objects as latent causes of visual input which the brain uses to make efficient predictions of future scenes.
arXiv Detail & Related papers (2024-03-06T14:19:11Z) - Towards A Unified Neural Architecture for Visual Recognition and
Reasoning [40.938279131241764]
We propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both.
Our framework enables the investigation of how different visual recognition tasks, datasets, and inductive biases can help enable principledtemporal reasoning capabilities.
arXiv Detail & Related papers (2023-11-10T20:27:43Z) - The ObjectFolder Benchmark: Multisensory Learning with Neural and Real
Objects [51.22194706674366]
We introduce the Object Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning.
We also introduce the Object Real dataset, including the multisensory measurements for 100 real-world household objects.
arXiv Detail & Related papers (2023-06-01T17:51:22Z) - BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity.
Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Embodied vision for learning object representations [4.211128681972148]
We show that visual statistics mimicking those of a toddler improve object recognition accuracy in both familiar and novel environments.
We argue that this effect is caused by the reduction of features extracted in the background, a neural network bias for large features in the image and a greater similarity between novel and familiar background regions.
arXiv Detail & Related papers (2022-05-12T16:36:27Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - The Challenge of Appearance-Free Object Tracking with Feedforward Neural
Networks [12.081808043723937]
$itPathTracker$ tests the ability of observers to learn to track objects solely by their motion.
We find that standard 3D-convolutional deep network models struggle to solve this task.
strategies for appearance-free object tracking from biological vision can inspire solutions.
arXiv Detail & Related papers (2021-09-30T17:58:53Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Learning Intermediate Features of Object Affordances with a
Convolutional Neural Network [1.52292571922932]
We train a deep convolutional neural network (CNN) to recognize affordances from images and to learn the underlying features or the dimensionality of affordances.
We view this representational analysis as the first step towards a more formal account of how humans perceive and interact with the environment.
arXiv Detail & Related papers (2020-02-20T19:04:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.