Image as Set of Points
- URL: http://arxiv.org/abs/2303.01494v1
- Date: Thu, 2 Mar 2023 18:56:39 GMT
- Title: Image as Set of Points
- Authors: Xu Ma, Yuqian Zhou, Huan Wang, Can Qin, Bin Sun, Chang Liu, Yun Fu
- Abstract summary: Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm.
Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction.
- Score: 60.30495338399321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What is an image and how to extract latent features? Convolutional Networks
(ConvNets) consider an image as organized pixels in a rectangular shape and
extract features via convolutional operation in local region; Vision
Transformers (ViTs) treat an image as a sequence of patches and extract
features via attention mechanism in a global range. In this work, we introduce
a straightforward and promising paradigm for visual representation, which is
called Context Clusters. Context clusters (CoCs) view an image as a set of
unorganized points and extract features via simplified clustering algorithm. In
detail, each point includes the raw feature (e.g., color) and positional
information (e.g., coordinates), and a simplified clustering algorithm is
employed to group and extract deep features hierarchically. Our CoCs are
convolution- and attention-free, and only rely on clustering algorithm for
spatial interaction. Owing to the simple design, we show CoCs endow gratifying
interpretability via the visualization of clustering process. Our CoCs aim at
providing a new perspective on image and visual representation, which may enjoy
broad applications in different domains and exhibit profound insights. Even
though we are not targeting SOTA performance, COCs still achieve comparable or
even better results than ConvNets or ViTs on several benchmarks. Codes are
available at: https://github.com/ma-xu/Context-Cluster.
Related papers
- Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Deep Structure and Attention Aware Subspace Clustering [29.967881186297582]
We propose a novel Deep Structure and Attention aware Subspace Clustering (DSASC)
We use a vision transformer to extract features, and the extracted features are divided into two parts, structure features, and content features.
Our method significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-25T01:19:47Z) - Grid Jigsaw Representation with CLIP: A New Perspective on Image
Clustering [37.15595383168132]
Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer.
GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets.
Experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
arXiv Detail & Related papers (2023-10-27T03:07:05Z) - CoC-GAN: Employing Context Cluster for Unveiling a New Pathway in Image
Generation [12.211795836214112]
We propose a unique image generation process premised on the perspective of converting images into a set of point clouds.
Our methodology leverages simple clustering methods named Context Clustering (CoC) to generate images from unordered point sets.
We introduce this model with the novel structure as the Context Clustering Generative Adversarial Network (CoC-GAN)
arXiv Detail & Related papers (2023-08-23T01:19:58Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Adaptively Clustering Neighbor Elements for Image-Text Generation [78.82346492527425]
We propose a novel Transformer-based image-to-text generation model termed as textbfACF.
ACF adaptively clusters vision patches into object regions and language words into phrases to implicitly learn object-phrase alignments.
Experiment results demonstrate the effectiveness of ACF, which outperforms most SOTA captioning and VQA models.
arXiv Detail & Related papers (2023-01-05T08:37:36Z) - DeepCut: Unsupervised Segmentation using Graph Neural Networks
Clustering [6.447863458841379]
This study introduces a lightweight Graph Neural Network (GNN) to replace classical clustering methods.
Unlike existing methods, our GNN takes both the pair-wise affinities between local image features and the raw features as input.
We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training an image segmentation GNN.
arXiv Detail & Related papers (2022-12-12T12:31:46Z) - GroupViT: Semantic Segmentation Emerges from Text Supervision [82.02467579704091]
Grouping and recognition are important components of visual scene understanding.
We propose a hierarchical Grouping Vision Transformer (GroupViT)
GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner.
arXiv Detail & Related papers (2022-02-22T18:56:04Z) - Learning Spatial Context with Graph Neural Network for Multi-Person Pose
Grouping [71.59494156155309]
Bottom-up approaches for image-based multi-person pose estimation consist of two stages: keypoint detection and grouping.
In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN)
The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association.
arXiv Detail & Related papers (2021-04-06T09:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.