Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning
- URL: http://arxiv.org/abs/2109.13499v1
- Date: Tue, 28 Sep 2021 05:40:01 GMT
- Title: Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning
- Authors: Zixu Zhao, Yueming Jin, Pheng-Ann Heng
- Abstract summary: This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
- Score: 53.74240452117145
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a self-supervised method for learning reliable visual
correspondence from unlabeled videos. We formulate the correspondence as
finding paths in a joint space-time graph, where nodes are grid patches sampled
from frames, and are linked by two types of edges: (i) neighbor relations that
determine the aggregation strength from intra-frame neighbors in space, and
(ii) similarity relations that indicate the transition probability of
inter-frame paths across time. Leveraging the cycle-consistency in videos, our
contrastive learning objective discriminates dynamic objects from both their
neighboring views and temporal views. Compared with prior works, our approach
actively explores the neighbor relations of central instances to learn a latent
association between center-neighbor pairs (e.g., "hand -- arm") across time,
thus improving the instance discrimination. Without fine-tuning, our learned
representation outperforms the state-of-the-art self-supervised methods on a
variety of visual tasks including video object propagation, part propagation,
and pose keypoint tracking. Our self-supervised method also surpasses some
fully supervised algorithms designed for the specific tasks.
Related papers
- Self-Supervised Correspondence Estimation via Multiview Registration [88.99287381176094]
Video provides us with the synchronization-temporal consistency needed for visual learning.
Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs.
We propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.
arXiv Detail & Related papers (2022-12-06T18:59:02Z) - Multiple Object Tracking with Correlation Learning [16.959379957515974]
We propose to exploit the local correlation module to model the topological relationship between targets and their surrounding environment.
Specifically, we establish dense correspondences of each spatial location and its context, and explicitly constrain the correlation volumes through self-supervised learning.
Our approach demonstrates the effectiveness of correlation learning with the superior performance and obtains state-of-the-art MOTA of 76.5% and IDF1 of 73.6% on MOT17.
arXiv Detail & Related papers (2021-04-08T06:48:02Z) - Temporal Contrastive Graph Learning for Video Action Recognition and
Retrieval [83.56444443849679]
This work takes advantage of the temporal dependencies within videos and proposes a novel self-supervised method named Temporal Contrastive Graph Learning (TCGL)
Our TCGL roots in a hybrid graph contrastive learning strategy to jointly regard the inter-snippet and intra-snippet temporal dependencies as self-supervision signals for temporal representation learning.
Experimental results demonstrate the superiority of our TCGL over the state-of-the-art methods on large-scale action recognition and video retrieval benchmarks.
arXiv Detail & Related papers (2021-01-04T08:11:39Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z) - Space-Time Correspondence as a Contrastive Random Walk [47.40711876423659]
We cast correspondence as prediction of links in a space-time graph constructed from video.
We learn a representation in which pairwise similarity defines transition probability of a random walk.
We demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.
arXiv Detail & Related papers (2020-06-25T17:56:05Z) - Co-Saliency Spatio-Temporal Interaction Network for Person
Re-Identification in Videos [85.6430597108455]
We propose a novel Co-Saliency Spatio-Temporal Interaction Network (CSTNet) for person re-identification in videos.
It captures the common salient foreground regions among video frames and explores the spatial-temporal long-range context interdependency from such regions.
Multiple spatialtemporal interaction modules within CSTNet are proposed, which exploit the spatial and temporal long-range context interdependencies on such features and spatial-temporal information correlation.
arXiv Detail & Related papers (2020-04-10T10:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.