Related papers: A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs

A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs

URL: http://arxiv.org/abs/2011.05459v3
Date: Tue, 24 Aug 2021 07:26:19 GMT
Title: A Self-supervised Learning System for Object Detection in Videos Using Random Walks on Graphs
Authors: Juntao Tan, Changkyu Song, Abdeslam Boularias
Abstract summary: This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video.
Score: 20.369646864364547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a new self-supervised system for learning to detect novel and previously unseen categories of objects in images. The proposed system receives as input several unlabeled videos of scenes containing various objects. The frames of the videos are segmented into objects using depth information, and the segments are tracked along each video. The system then constructs a weighted graph that connects sequences based on the similarities between the objects that they contain. The similarity between two sequences of objects is measured by using generic visual features, after automatically re-arranging the frames in the two sequences to align the viewpoints of the objects. The graph is used to sample triplets of similar and dissimilar examples by performing random walks. The triplet examples are finally used to train a siamese neural network that projects the generic visual features into a low-dimensional manifold. Experiments on three public datasets, YCB-Video, CORe50 and RGBD-Object, show that the projected low-dimensional features improve the accuracy of clustering unknown objects into novel categories, and outperform several recent unsupervised clustering techniques.

Related papers

UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance [6.577227592760559]
UnsMOT is a novel framework that combines appearance and motion features of objects with geometric information to provide more accurate tracking. Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.
arXiv Detail & Related papers (2023-09-03T04:58:12Z)
Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos [11.40098981859033]
This work proposes a self-supervised learning system for segmenting rigid objects in RGB images. The proposed pipeline is trained on unlabeled RGB-D videos of static objects, which can be captured with a camera carried by a mobile robot.
arXiv Detail & Related papers (2023-04-09T23:13:39Z)
Is an Object-Centric Video Representation Beneficial for Transfer? [86.40870804449737]
We introduce a new object-centric video recognition model on a transformer architecture. We show that the object-centric model outperforms prior video representations.
arXiv Detail & Related papers (2022-07-20T17:59:44Z)
Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach. Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z)
Recent Trends in 2D Object Detection and Applications in Video Event Recognition [0.76146285961466]
We discuss the pioneering works in object detection, followed by the recent breakthroughs that employ deep learning. We highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
arXiv Detail & Related papers (2022-02-07T14:15:11Z)
Iterative Knowledge Exchange Between Deep Learning and Space-Time Spectral Clustering for Unsupervised Segmentation in Videos [17.47403549514259]
We propose a dual system for unsupervised object segmentation in video. The first module is a space-time graph that discovers objects in videos. The second module is a deep network that learns powerful object features.
arXiv Detail & Related papers (2020-12-13T18:36:18Z)
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects. We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z)
Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem. We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion. A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z)
DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole. Our method first partitions the motion field by minimizing the mutual information between segments. It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z)
OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features [14.115782214599015]
One-shot object detection consists in detecting objects defined by a single demonstration. We build the one-stage system that performs localization and recognition jointly. Experimental evaluation on several challenging domains shows that our method can detect unseen classes.
arXiv Detail & Related papers (2020-03-15T11:39:47Z)
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks [150.5425122989146]
This work proposes a novel attentive graph neural network (AGNN) for zero-shot video object segmentation (ZVOS) AGNN builds a fully connected graph to efficiently represent frames as nodes, and relations between arbitrary frame pairs as edges. Experimental results on three video segmentation datasets show that AGNN sets a new state-of-the-art in each case.
arXiv Detail & Related papers (2020-01-19T10:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.