Eigen-Cluster VIS: Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency
- URL: http://arxiv.org/abs/2408.16661v1
- Date: Thu, 29 Aug 2024 16:05:05 GMT
- Title: Eigen-Cluster VIS: Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency
- Authors: Farnoosh Arefi, Amir M. Mansourian, Shohreh Kasaei,
- Abstract summary: This work introduces a novel weakly-supervised method called Eigen-cluster VIS.
It achieves competitive accuracy compared to other VIS approaches without requiring any mask annotations.
It is evaluated on the YouTube-VIS21 and OVIS 2019/20 datasets.
- Score: 9.115508086522887
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of Video Instance Segmentation (VIS) methods has improved significantly with the advent of transformer networks. However, these networks often face challenges in training due to the high annotation cost. To address this, unsupervised and weakly-supervised methods have been developed to reduce the dependency on annotations. This work introduces a novel weakly-supervised method called Eigen-cluster VIS that, without requiring any mask annotations, achieves competitive accuracy compared to other VIS approaches. This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Cluster Coefficient (QCC). The TEL ensures temporal coherence by leveraging the eigenvalues of the Laplacian matrix derived from graph adjacency matrices. By minimizing the mean absolute error (MAE) between the eigenvalues of adjacent frames, this loss function promotes smooth transitions and stable segmentation boundaries over time, reducing temporal discontinuities and improving overall segmentation quality. The QCC employs the K-means method to ensure the quality of spatio-temporal clusters without relying on ground truth masks. Using the Davies-Bouldin score, the QCC provides an unsupervised measure of feature discrimination, allowing the model to self-evaluate and adapt to varying object distributions, enhancing robustness during the testing phase. These enhancements are computationally efficient and straightforward, offering significant performance gains without additional annotated data. The proposed Eigen-Cluster VIS method is evaluated on the YouTube-VIS 2019/2021 and OVIS datasets, demonstrating that it effectively narrows the performance gap between the fully-supervised and weakly-supervised VIS approaches. The code is available on: https://github.com/farnooshar/EigenClusterVIS
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Modularity aided consistent attributed graph clustering via coarsening [6.522020196906943]
Graph clustering is an important unsupervised learning technique for partitioning graphs with attributes and detecting communities.
We propose a loss function incorporating log-determinant, smoothness, and modularity components using a block majorization-minimization technique.
Our algorithm seamlessly integrates graph neural networks (GNNs) and variational graph autoencoders (VGAEs) to learn enhanced node features and deliver exceptional clustering performance.
arXiv Detail & Related papers (2024-07-09T10:42:19Z) - DynaSeg: A Deep Dynamic Fusion Method for Unsupervised Image Segmentation Incorporating Feature Similarity and Spatial Continuity [0.5755004576310334]
We introduce DynaSeg, an innovative unsupervised image segmentation approach.
Unlike traditional methods, DynaSeg employs a dynamic weighting scheme.
It adapts flexibly to image characteristics, and facilitates easy integration with other segmentation networks.
arXiv Detail & Related papers (2024-05-09T00:30:45Z) - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation [76.68301884987348]
We propose a simple yet effective approach for self-supervised video object segmentation (VOS)
Our key insight is that the inherent structural dependencies present in DINO-pretrained Transformers can be leveraged to establish robust-temporal segmentation correspondences in videos.
Our method demonstrates state-of-the-art performance across multiple unsupervised VOS benchmarks and excels in complex real-world multi-object video segmentation tasks.
arXiv Detail & Related papers (2023-11-29T18:47:17Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Skip-Attention: Improving Vision Transformers by Paying Less Attention [55.47058516775423]
Vision computation transformers (ViTs) use expensive self-attention operations in every layer.
We propose SkipAt, a method to reuse self-attention from preceding layers to approximate attention at one or more subsequent layers.
We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS.
arXiv Detail & Related papers (2023-01-05T18:59:52Z) - On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual
Recognition [65.67315418971688]
We show that truncating small eigenvalues of the Global Covariance Pooling (GCP) can attain smoother gradient.
On fine-grained datasets, truncating the small eigenvalues would make the model fail to converge.
Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
arXiv Detail & Related papers (2022-05-26T11:41:36Z) - Improved Dual Correlation Reduction Network [40.792587861237166]
We propose a novel deep graph clustering algorithm termed Improved Dual Correlation Reduction Network (IDCRN)
By approximating the cross-view feature correlation matrix to an identity matrix, we reduce the redundancy between different dimensions of features.
We also avoid the collapsed representation caused by the over-smoothing issue in Graph Convolutional Networks (GCNs) through an introduced propagation regularization term.
arXiv Detail & Related papers (2022-02-25T07:48:32Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Self-supervised Equivariant Attention Mechanism for Weakly Supervised
Semantic Segmentation [93.83369981759996]
We propose a self-supervised equivariant attention mechanism (SEAM) to discover additional supervision and narrow the gap.
Our method is based on the observation that equivariance is an implicit constraint in fully supervised semantic segmentation.
We propose consistency regularization on predicted CAMs from various transformed images to provide self-supervision for network learning.
arXiv Detail & Related papers (2020-04-09T14:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.