GOCA: Guided Online Cluster Assignment for Self-Supervised Video
Representation Learning
- URL: http://arxiv.org/abs/2207.10158v1
- Date: Wed, 20 Jul 2022 19:26:55 GMT
- Title: GOCA: Guided Online Cluster Assignment for Self-Supervised Video
Representation Learning
- Authors: Huseyin Coskun and Alireza Zareian and Joshua L. Moore and Federico
Tombari, Chen Wang
- Abstract summary: Clustering is a ubiquitous tool in unsupervised learning.
Most of the existing self-supervised representation learning methods typically cluster samples based on visually dominant features.
We propose a principled way to combine two views. Specifically, we propose a novel clustering strategy where we use the initial cluster assignment of each view as prior to guide the final cluster assignment of the other view.
- Score: 49.69279760597111
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clustering is a ubiquitous tool in unsupervised learning. Most of the
existing self-supervised representation learning methods typically cluster
samples based on visually dominant features. While this works well for
image-based self-supervision, it often fails for videos, which require
understanding motion rather than focusing on background. Using optical flow as
complementary information to RGB can alleviate this problem. However, we
observe that a naive combination of the two views does not provide meaningful
gains. In this paper, we propose a principled way to combine two views.
Specifically, we propose a novel clustering strategy where we use the initial
cluster assignment of each view as prior to guide the final cluster assignment
of the other view. This idea will enforce similar cluster structures for both
views, and the formed clusters will be semantically abstract and robust to
noisy inputs coming from each individual view. Additionally, we propose a novel
regularization strategy to address the feature collapse problem, which is
common in cluster-based self-supervised learning methods. Our extensive
evaluation shows the effectiveness of our learned representations on downstream
tasks, e.g., video retrieval and action recognition. Specifically, we
outperform the state of the art by 7% on UCF and 4% on HMDB for video
retrieval, and 5% on UCF and 6% on HMDB for video classification
Related papers
- Deep Structure and Attention Aware Subspace Clustering [29.967881186297582]
We propose a novel Deep Structure and Attention aware Subspace Clustering (DSASC)
We use a vision transformer to extract features, and the extracted features are divided into two parts, structure features, and content features.
Our method significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-25T01:19:47Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - CrOC: Cross-View Online Clustering for Dense Visual Representation
Learning [39.12950211289954]
We propose a Cross-view consistency objective with an Online Clustering mechanism (CrOC) to discover and segment the semantics of the views.
In the absence of hand-crafted priors, the resulting method is more generalizable and does not require a cumbersome pre-processing step.
We demonstrate excellent performance on linear and unsupervised segmentation transfer tasks on various datasets.
arXiv Detail & Related papers (2023-03-23T13:24:16Z) - Graph Representation Learning via Contrasting Cluster Assignments [57.87743170674533]
We propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA.
It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning.
GRCCA has strong competitiveness in most tasks.
arXiv Detail & Related papers (2021-12-15T07:28:58Z) - Unsupervised Visual Representation Learning by Online Constrained
K-Means [44.38989920488318]
Cluster discrimination is an effective pretext task for unsupervised representation learning.
We propose a novel clustering-based pretext task with online textbfConstrained textbfK-mtextbfeans (textbfCoKe)
Our online assignment method has a theoretical guarantee to approach the global optimum.
arXiv Detail & Related papers (2021-05-24T20:38:32Z) - Temporally-Weighted Hierarchical Clustering for Unsupervised Action
Segmentation [96.67525775629444]
Action segmentation refers to inferring boundaries of semantically consistent visual concepts in videos.
We present a fully automatic and unsupervised approach for segmenting actions in a video that does not require any training.
Our proposal is an effective temporally-weighted hierarchical clustering algorithm that can group semantically consistent frames of the video.
arXiv Detail & Related papers (2021-03-20T23:30:01Z) - Consensus Clustering With Unsupervised Representation Learning [4.164845768197489]
We study the clustering ability of Bootstrap Your Own Latent (BYOL) and observe that features learnt using BYOL may not be optimal for clustering.
We propose a novel consensus clustering based loss function, and train BYOL with the proposed loss in an end-to-end way that improves the clustering ability and outperforms similar clustering based methods.
arXiv Detail & Related papers (2020-10-03T01:16:46Z) - Unsupervised Learning of Video Representations via Dense Trajectory
Clustering [86.45054867170795]
This paper addresses the task of unsupervised learning of representations for action recognition in videos.
We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation.
We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns.
arXiv Detail & Related papers (2020-06-28T22:23:03Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.