A Self-supervised Learning System for Object Detection in Videos Using
Random Walks on Graphs
- URL: http://arxiv.org/abs/2011.05459v3
- Date: Tue, 24 Aug 2021 07:26:19 GMT
- Title: A Self-supervised Learning System for Object Detection in Videos Using
Random Walks on Graphs
- Authors: Juntao Tan, Changkyu Song, Abdeslam Boularias
- Abstract summary: This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images.
The proposed system receives as input several unlabeled videos of scenes containing various objects.
The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video.
- Score: 20.369646864364547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a new self-supervised system for learning to detect novel
and previously unseen categories of objects in images. The proposed system
receives as input several unlabeled videos of scenes containing various
objects. The frames of the videos are segmented into objects using depth
information, and the segments are tracked along each video. The system then
constructs a weighted graph that connects sequences based on the similarities
between the objects that they contain. The similarity between two sequences of
objects is measured by using generic visual features, after automatically
re-arranging the frames in the two sequences to align the viewpoints of the
objects. The graph is used to sample triplets of similar and dissimilar
examples by performing random walks. The triplet examples are finally used to
train a siamese neural network that projects the generic visual features into a
low-dimensional manifold. Experiments on three public datasets, YCB-Video,
CORe50 and RGBD-Object, show that the projected low-dimensional features
improve the accuracy of clustering unknown objects into novel categories, and
outperform several recent unsupervised clustering techniques.
Related papers
- UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with
Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking.
Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z) - Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D
Videos [11.40098981859033]
This work proposes a self-supervised learning system for segmenting rigid objects in RGB images.
The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot.
arXiv Detail & Related papers (2023-04-09T23:13:39Z) - Is an Object-Centric Video Representation Beneficial for Transfer? [86.40870804449737]
We introduce a new object-centric video recognition model on a transformer architecture.
We show that the object-centric model outperforms prior video representations.
arXiv Detail & Related papers (2022-07-20T17:59:44Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Recent Trends in 2D Object Detection and Applications in Video Event
Recognition [0.76146285961466]
We discuss the pioneering works in object detection, followed by the recent breakthroughs that employ deep learning.
We highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
arXiv Detail & Related papers (2022-02-07T14:15:11Z) - Iterative Knowledge Exchange Between Deep Learning and Space-Time
Spectral Clustering for Unsupervised Segmentation in Videos [17.47403549514259]
We propose a dual system for unsupervised object segmentation in video.
The first module is a space-time graph that discovers objects in videos.
The second module is a deep network that learns powerful object features.
arXiv Detail & Related papers (2020-12-13T18:36:18Z) - CompFeat: Comprehensive Feature Aggregation for Video Instance
Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.
Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects.
We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features [14.115782214599015]
One-shot object detection consists in detecting objects defined by a single demonstration.
We build the one-stage system that performs localization and recognition jointly.
Experimental evaluation on several challenging domains shows that our method can detect unseen classes.
arXiv Detail & Related papers (2020-03-15T11:39:47Z) - Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS)
AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges.
Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.