Learning a Fast 3D Spectral Approach to Object Segmentation and Tracking
over Space and Time
- URL: http://arxiv.org/abs/2212.08058v1
- Date: Thu, 15 Dec 2022 18:59:07 GMT
- Title: Learning a Fast 3D Spectral Approach to Object Segmentation and Tracking
over Space and Time
- Authors: Elena Burceanu and Marius Leordeanu
- Abstract summary: We pose video object segmentation as spectral graph clustering in space and time.
We introduce a novel and efficient method based on 3D filtering for approximating the spectral solution.
We extend the formulation of our approach beyond the segmentation task, into the realm of object tracking.
- Score: 21.130594354306815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We pose video object segmentation as spectral graph clustering in space and
time, with one graph node for each pixel and edges forming local space-time
neighborhoods. We claim that the strongest cluster in this video graph
represents the salient object. We start by introducing a novel and efficient
method based on 3D filtering for approximating the spectral solution, as the
principal eigenvector of the graph's adjacency matrix, without explicitly
building the matrix. This key property allows us to have a fast parallel
implementation on GPU, orders of magnitude faster than classical approaches for
computing the eigenvector. Our motivation for a spectral space-time clustering
approach, unique in video semantic segmentation literature, is that such
clustering is dedicated to preserving object consistency over time, which we
evaluate using our novel segmentation consistency measure. Further on, we show
how to efficiently learn the solution over multiple input feature channels.
Finally, we extend the formulation of our approach beyond the segmentation
task, into the realm of object tracking. In extensive experiments we show
significant improvements over top methods, as well as over powerful ensembles
that combine them, achieving state-of-the-art on multiple benchmarks, both for
tracking and segmentation.
Related papers
- Learning Spatial-Temporal Regularized Tensor Sparse RPCA for Background
Subtraction [6.825970634402847]
We present a spatial-temporal regularized tensor sparse RPCA algorithm for precise background subtraction.
Experiments are performed on six publicly available background subtraction datasets.
arXiv Detail & Related papers (2023-09-27T11:21:31Z) - Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast
Contrastive Fusion [110.84357383258818]
We propose a novel approach to lift 2D segments to 3D and fuse them by means of a neural field representation.
The core of our approach is a slow-fast clustering objective function, which is scalable and well-suited for scenes with a large number of objects.
Our approach outperforms the state-of-the-art on challenging scenes from the ScanNet, Hypersim, and Replica datasets.
arXiv Detail & Related papers (2023-06-07T17:57:45Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem.
We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene.
By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z) - PointFlow: Flowing Semantics Through Points for Aerial Image
Segmentation [96.76882806139251]
We propose a point-wise affinity propagation module based on the Feature Pyramid Network (FPN) framework, named PointFlow.
Rather than dense affinity learning, a sparse affinity map is generated upon selected points between the adjacent features.
Experimental results on three different aerial segmentation datasets suggest that the proposed method is more effective and efficient than state-of-the-art general semantic segmentation methods.
arXiv Detail & Related papers (2021-03-11T09:42:32Z) - Iterative Knowledge Exchange Between Deep Learning and Space-Time
Spectral Clustering for Unsupervised Segmentation in Videos [17.47403549514259]
We propose a dual system for unsupervised object segmentation in video.
The first module is a space-time graph that discovers objects in videos.
The second module is a deep network that learns powerful object features.
arXiv Detail & Related papers (2020-12-13T18:36:18Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Unsupervised Spatio-temporal Latent Feature Clustering for
Multiple-object Tracking and Segmentation [0.5591659577198183]
We propose a strategy that treats the temporal identification task as a heterogeneous-temporal clustering problem.
We use a convolutional and fully connected autoencoder to learn discriminative features from segmentation masks and detection bounding boxes.
Our results show that our technique outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2020-07-14T16:47:56Z) - Revisiting Sequence-to-Sequence Video Object Segmentation with
Multi-Task Loss and Skip-Memory [4.343892430915579]
Video Object (VOS) is an active research area of the visual domain.
Current approaches lose objects in longer sequences, especially when the object is small or briefly occluded.
We build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data.
arXiv Detail & Related papers (2020-04-25T15:38:09Z) - Spatial Pyramid Based Graph Reasoning for Semantic Segmentation [67.47159595239798]
We apply graph convolution into the semantic segmentation task and propose an improved Laplacian.
The graph reasoning is directly performed in the original feature space organized as a spatial pyramid.
We achieve comparable performance with advantages in computational and memory overhead.
arXiv Detail & Related papers (2020-03-23T12:28:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.