Tracking Without Re-recognition in Humans and Machines
- URL: http://arxiv.org/abs/2105.13351v1
- Date: Thu, 27 May 2021 17:56:37 GMT
- Title: Tracking Without Re-recognition in Humans and Machines
- Authors: Drew Linsley, Girik Malik, Junkyung Kim, Lakshmi N Govindarajan, Ennio
Mingolla, and Thomas Serre
- Abstract summary: We investigate if state-of-the-art deep neural networks for visual tracking are capable of the same.
We introduce PathTracker, a synthetic visual challenge that asks human observers and machines to track a target object.
We model circuit mechanisms in biological brains that are implicated in tracking objects based on motion cues.
- Score: 12.591847867999636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imagine trying to track one particular fruitfly in a swarm of hundreds.
Higher biological visual systems have evolved to track moving objects by
relying on both appearance and motion features. We investigate if
state-of-the-art deep neural networks for visual tracking are capable of the
same. For this, we introduce PathTracker, a synthetic visual challenge that
asks human observers and machines to track a target object in the midst of
identical-looking "distractor" objects. While humans effortlessly learn
PathTracker and generalize to systematic variations in task design,
state-of-the-art deep networks struggle. To address this limitation, we
identify and model circuit mechanisms in biological brains that are implicated
in tracking objects based on motion cues. When instantiated as a recurrent
network, our circuit model learns to solve PathTracker with a robust visual
strategy that rivals human performance and explains a significant proportion of
their decision-making on the challenge. We also show that the success of this
circuit model extends to object tracking in natural videos. Adding it to a
transformer-based architecture for object tracking builds tolerance to visual
nuisances that affect object appearance, resulting in a new state-of-the-art
performance on the large-scale TrackingNet object tracking challenge. Our work
highlights the importance of building artificial vision models that can help us
better understand human vision and improve computer vision.
Related papers
- Tracking objects that change in appearance with phase synchrony [14.784044408031098]
We show that a novel deep learning circuit can learn to control attention to features separately from their location in the world through neural synchrony.
We compare object tracking in humans, the CV-RNN, and other deep neural networks (DNNs) using FeatureTracker: a large-scale challenge.
Our CV-RNN behaved similarly to humans on the challenge, providing a computational proof-of-concept for the role of phase synchronization.
arXiv Detail & Related papers (2024-10-02T23:30:05Z) - Learning Object-Centric Representation via Reverse Hierarchy Guidance [73.05170419085796]
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes.
RHGNet introduces a top-down pathway that works in different ways in the training and inference processes.
Our model achieves SOTA performance on several commonly used datasets.
arXiv Detail & Related papers (2024-05-17T07:48:27Z) - Tracking through Containers and Occluders in the Wild [32.86030395660071]
We introduce $textbfTCOW$, a new benchmark and model for visual tracking through heavy occlusion and containment.
We create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance.
We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.
arXiv Detail & Related papers (2023-05-04T17:59:58Z) - BI AVAN: Brain inspired Adversarial Visual Attention Network [67.05560966998559]
We propose a brain-inspired adversarial visual attention network (BI-AVAN) to characterize human visual attention directly from functional brain activity.
Our model imitates the biased competition process between attention-related/neglected objects to identify and locate the visual objects in a movie frame the human brain focuses on in an unsupervised manner.
arXiv Detail & Related papers (2022-10-27T22:20:36Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - Learning What and Where -- Unsupervised Disentangling Location and
Identity Tracking [0.44040106718326594]
We introduce an unsupervisedd LOCation and Identity tracking system (Loci)
Inspired by the dorsal-ventral pathways in the brain, Loci tackles the what-and-where binding problem by means of a self-supervised segregation mechanism.
Loci may set the stage for deeper, explanation-oriented video processing.
arXiv Detail & Related papers (2022-05-26T13:30:14Z) - Single Object Tracking Research: A Survey [44.24280758718638]
This paper presents the rationale and works of two most popular tracking frameworks in past ten years.
We present some deep learning based tracking methods categorized by different network structures.
We also introduce some classical strategies for handling the challenges in tracking problem.
arXiv Detail & Related papers (2022-04-25T02:59:15Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - The Challenge of Appearance-Free Object Tracking with Feedforward Neural
Networks [12.081808043723937]
$itPathTracker$ tests the ability of observers to learn to track objects solely by their motion.
We find that standard 3D-convolutional deep network models struggle to solve this task.
strategies for appearance-free object tracking from biological vision can inspire solutions.
arXiv Detail & Related papers (2021-09-30T17:58:53Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Robust Visual Object Tracking with Two-Stream Residual Convolutional
Networks [62.836429958476735]
We propose a Two-Stream Residual Convolutional Network (TS-RCN) for visual tracking.
Our TS-RCN can be integrated with existing deep learning based visual trackers.
To further improve the tracking performance, we adopt a "wider" residual network ResNeXt as its feature extraction backbone.
arXiv Detail & Related papers (2020-05-13T19:05:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.