Object-Centric Learning for Real-World Videos by Predicting Temporal
Feature Similarities
- URL: http://arxiv.org/abs/2306.04829v2
- Date: Fri, 8 Dec 2023 13:40:04 GMT
- Title: Object-Centric Learning for Real-World Videos by Predicting Temporal
Feature Similarities
- Authors: Andrii Zadaianchuk and Maximilian Seitzer and Georg Martius
- Abstract summary: We propose a novel way to use pre-trained features in the form of a temporal feature similarity loss.
This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery.
We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets.
- Score: 30.564704737585558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised video-based object-centric learning is a promising avenue to
learn structured representations from large, unlabeled video collections, but
previous approaches have only managed to scale to real-world datasets in
restricted domains. Recently, it was shown that the reconstruction of
pre-trained self-supervised features leads to object-centric representations on
unconstrained real-world image datasets. Building on this approach, we propose
a novel way to use such pre-trained features in the form of a temporal feature
similarity loss. This loss encodes semantic and temporal correlations between
image patches and is a natural way to introduce a motion bias for object
discovery. We demonstrate that this loss leads to state-of-the-art performance
on the challenging synthetic MOVi datasets. When used in combination with the
feature reconstruction loss, our model is the first object-centric video model
that scales to unconstrained video datasets such as YouTube-VIS.
Related papers
- Pre-training for Action Recognition with Automatically Generated Fractal Datasets [23.686476742398973]
We present methods to automatically produce large-scale datasets of short synthetic video clips.
The generated video clips are characterized by notable variety, stemmed by the innate ability of fractals to generate complex multi-scale structures.
Compared to standard Kinetics pre-training, our reported results come close and are even superior on a portion of downstream datasets.
arXiv Detail & Related papers (2024-11-26T16:51:11Z) - Object-Centric Temporal Consistency via Conditional Autoregressive Inductive Biases [69.46487306858789]
Conditional Autoregressive Slot Attention (CA-SA) is a framework that enhances the temporal consistency of extracted object-centric representations in video-centric vision tasks.
We present qualitative and quantitative results showing that our proposed method outperforms the considered baselines on downstream tasks.
arXiv Detail & Related papers (2024-10-21T07:44:44Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Object-centric and memory-guided normality reconstruction for video
anomaly detection [56.64792194894702]
This paper addresses anomaly detection problem for videosurveillance.
Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy.
Our model learns object-centric normal patterns without seeing anomalous samples during training.
arXiv Detail & Related papers (2022-03-07T19:28:39Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Unsupervised Video Decomposition using Spatio-temporal Iterative
Inference [31.97227651679233]
Multi-object scene decomposition is a fast-emerging problem in learning.
We show that our model has a high accuracy even without color information.
We demonstrate the decomposition, segmentation prediction capabilities of our model and show that it outperforms the state-of-the-art on several benchmark datasets.
arXiv Detail & Related papers (2020-06-25T22:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.