Unsupervised Spatio-temporal Latent Feature Clustering for
Multiple-object Tracking and Segmentation
- URL: http://arxiv.org/abs/2007.07175v3
- Date: Fri, 5 Nov 2021 02:19:37 GMT
- Title: Unsupervised Spatio-temporal Latent Feature Clustering for
Multiple-object Tracking and Segmentation
- Authors: Abubakar Siddique, Reza Jalil Mozhdehi, and Henry Medeiros
- Abstract summary: We propose a strategy that treats the temporal identification task as a heterogeneous-temporal clustering problem.
We use a convolutional and fully connected autoencoder to learn discriminative features from segmentation masks and detection bounding boxes.
Our results show that our technique outperforms several state-of-the-art methods.
- Score: 0.5591659577198183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assigning consistent temporal identifiers to multiple moving objects in a
video sequence is a challenging problem. A solution to that problem would have
immediate ramifications in multiple object tracking and segmentation problems.
We propose a strategy that treats the temporal identification task as a
spatio-temporal clustering problem. We propose an unsupervised learning
approach using a convolutional and fully connected autoencoder, which we call
deep heterogeneous autoencoder, to learn discriminative features from
segmentation masks and detection bounding boxes. We extract masks and their
corresponding bounding boxes from a pretrained instance segmentation network
and train the autoencoders jointly using task-dependent uncertainty weights to
generate common latent features. We then construct constraints graphs that
encourage associations among objects that satisfy a set of known temporal
conditions. The feature vectors and the constraints graphs are then provided to
the kmeans clustering algorithm to separate the corresponding data points in
the latent space. We evaluate the performance of our method using challenging
synthetic and real-world multiple-object video datasets. Our results show that
our technique outperforms several state-of-the-art methods.
Related papers
- Let-It-Flow: Simultaneous Optimization of 3D Flow and Object Clustering [2.763111962660262]
We study the problem of self-supervised 3D scene flow estimation from real large-scale raw point cloud sequences.
We propose a novel clustering approach that allows for combination of overlapping soft clusters as well as non-overlapping rigid clusters.
Our method especially excels in resolving flow in complicated dynamic scenes with multiple independently moving objects close to each other.
arXiv Detail & Related papers (2024-04-12T10:04:03Z) - Learning a Fast 3D Spectral Approach to Object Segmentation and Tracking
over Space and Time [21.130594354306815]
We pose video object segmentation as spectral graph clustering in space and time.
We introduce a novel and efficient method based on 3D filtering for approximating the spectral solution.
We extend the formulation of our approach beyond the segmentation task, into the realm of object tracking.
arXiv Detail & Related papers (2022-12-15T18:59:07Z) - Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
Semantic Segmentation and Localization [98.46318529630109]
We take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem.
We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene.
By clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions.
arXiv Detail & Related papers (2022-05-16T17:47:44Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Robust Instance Segmentation through Reasoning about Multi-Object
Occlusion [9.536947328412198]
We propose a deep network for multi-object instance segmentation that is robust to occlusion.
Our work builds on Compositional Networks, which learn a generative model of neural feature activations to locate occluders.
In particular, we obtain feed-forward predictions of the object classes and their instance and occluder segmentations.
arXiv Detail & Related papers (2020-12-03T17:41:55Z) - Video Anomaly Detection by Estimating Likelihood of Representations [21.879366166261228]
Video anomaly is a challenging task because it involves solving many sub-tasks such as motion representation, object localization and action recognition.
Traditionally, solutions to this task have focused on the mapping between video frames and their low-dimensional features, while ignoring the spatial connections of those features.
Recent solutions focus on analyzing these spatial connections by using hard clustering techniques, such as K-Means, or applying neural networks to map latent features to a general understanding.
In order to solve video anomaly in the latent feature space, we propose a deep probabilistic model to transfer this task into a density estimation problem.
arXiv Detail & Related papers (2020-12-02T19:16:22Z) - Revisiting Sequence-to-Sequence Video Object Segmentation with
Multi-Task Loss and Skip-Memory [4.343892430915579]
Video Object (VOS) is an active research area of the visual domain.
Current approaches lose objects in longer sequences, especially when the object is small or briefly occluded.
We build upon a sequence-to-sequence approach that employs an encoder-decoder architecture together with a memory module for exploiting the sequential data.
arXiv Detail & Related papers (2020-04-25T15:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.