Seeing Objects in a Cluttered World: Computational Objectness from
Motion in Video
- URL: http://arxiv.org/abs/2402.01126v1
- Date: Fri, 2 Feb 2024 03:57:11 GMT
- Title: Seeing Objects in a Cluttered World: Computational Objectness from
Motion in Video
- Authors: Douglas Poland and Amar Saini
- Abstract summary: Perception of the visually disjoint surfaces of our world as whole objects physically distinct from those overlapping them forms the basis of our visual perception.
We present a simple but novel approach to infer objectness from phenomenology without object models.
We show that it delivers robust perception of individual attended objects in cluttered scenes, even with blur and camera shake.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Perception of the visually disjoint surfaces of our cluttered world as whole
objects, physically distinct from those overlapping them, is a cognitive
phenomenon called objectness that forms the basis of our visual perception.
Shared by all vertebrates and present at birth in humans, it enables
object-centric representation and reasoning about the visual world. We present
a computational approach to objectness that leverages motion cues and
spatio-temporal attention using a pair of supervised spatio-temporal
R(2+1)U-Nets. The first network detects motion boundaries and classifies the
pixels at those boundaries in terms of their local foreground-background sense.
This motion boundary sense (MBS) information is passed, along with a
spatio-temporal object attention cue, to an attentional surface perception
(ASP) module which infers the form of the attended object over a sequence of
frames and classifies its 'pixels' as visible or obscured. The spatial form of
the attention cue is flexible, but it must loosely track the attended object
which need not be visible. We demonstrate the ability of this simple but novel
approach to infer objectness from phenomenology without object models, and show
that it delivers robust perception of individual attended objects in cluttered
scenes, even with blur and camera shake. We show that our data diversity and
augmentation minimizes bias and facilitates transfer to real video. Finally, we
describe how this computational objectness capability can grow in
sophistication and anchor a robust modular video object perception framework.
Related papers
- Unsupervised Discovery of Object-Centric Neural Fields [21.223170092979498]
We study inferring 3D object-centric scene representations from a single image.
We propose Unsupervised discovery of Object-Centric neural Fields (uOCF)
arXiv Detail & Related papers (2024-02-12T02:16:59Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - Spotlight Attention: Robust Object-Centric Learning With a Spatial
Locality Prior [88.9319150230121]
Object-centric vision aims to construct an explicit representation of the objects in a scene.
We incorporate a spatial-locality prior into state-of-the-art object-centric vision models.
We obtain significant improvements in segmenting objects in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-31T04:35:50Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Bi-directional Object-context Prioritization Learning for Saliency
Ranking [60.62461793691836]
Existing approaches focus on learning either object-object or object-scene relations.
We observe that spatial attention works concurrently with object-based attention in the human visual recognition system.
We propose a novel bi-directional method to unify spatial attention and object-based attention for saliency ranking.
arXiv Detail & Related papers (2022-03-17T16:16:03Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Capturing the objects of vision with neural networks [0.0]
Human visual perception carves a scene at its physical joints, decomposing the world into objects.
Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input.
We review related work in both fields and examine how these fields can help each other.
arXiv Detail & Related papers (2021-09-07T21:49:53Z) - A topological solution to object segmentation and tracking [0.951828574518325]
Current computer vision approaches to segmentation and tracking that approach human performance all require learning.
Here, we show that the mathematical structure of light rays reflected from environment surfaces yields a natural representation of persistent surfaces.
We demonstrate that our approach can segment and invariantly track objects in cluttered synthetic video despite severe appearance changes, without requiring learning.
arXiv Detail & Related papers (2021-07-05T13:52:57Z) - Unsupervised Discovery of 3D Physical Objects from Video [15.939924306990548]
We explore how physics, especially object interactions, facilitates disentangling of 3D geometry and position of objects from video, in an unsupervised manner.
Our Physical Object Discovery Network (POD-Net) uses both multi-scale pixel cues and physical motion cues to accurately segment observable and partially occluded objects of varying sizes.
arXiv Detail & Related papers (2020-07-24T04:46:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.