Related papers: Image as Set of Points

Image as Set of Points

URL: http://arxiv.org/abs/2303.01494v1
Date: Thu, 2 Mar 2023 18:56:39 GMT
Title: Image as Set of Points
Authors: Xu Ma, Yuqian Zhou, Huan Wang, Can Qin, Bin Sun, Chang Liu, Yun Fu
Abstract summary: Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm. Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction.
Score: 60.30495338399321
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What is an image and how to extract latent features? Convolutional Networks (ConvNets) consider an image as organized pixels in a rectangular shape and extract features via convolutional operation in local region; Vision Transformers (ViTs) treat an image as a sequence of patches and extract features via attention mechanism in a global range. In this work, we introduce a straightforward and promising paradigm for visual representation, which is called Context Clusters. Context clusters (CoCs) view an image as a set of unorganized points and extract features via simplified clustering algorithm. In detail, each point includes the raw feature (e.g., color) and positional information (e.g., coordinates), and a simplified clustering algorithm is employed to group and extract deep features hierarchically. Our CoCs are convolution- and attention-free, and only rely on clustering algorithm for spatial interaction. Owing to the simple design, we show CoCs endow gratifying interpretability via the visualization of clustering process. Our CoCs aim at providing a new perspective on image and visual representation, which may enjoy broad applications in different domains and exhibit profound insights. Even though we are not targeting SOTA performance, COCs still achieve comparable or even better results than ConvNets or ViTs on several benchmarks. Codes are available at: https://github.com/ma-xu/Context-Cluster.

Related papers

Structural-Spectral Graph Convolution with Evidential Edge Learning for Hyperspectral Image Clustering [59.24638672786966]
Hyperspectral image (HSI) clustering assigns similar pixels to the same class without any annotations.<n>Existing graph neural networks (GNNs) cannot fully exploit the spectral information of the input HSI.<n>We propose a structural-spectral graph convolutional operator (SSGCO) tailored for graph-structured HSI superpixels.
arXiv Detail & Related papers (2025-06-11T16:41:34Z)
I Spy With My Little Eye: A Minimum Cost Multicut Investigation of Dataset Frames [12.177674038614658]
Visual framing analysis is a key method in social sciences for determining common themes and concepts in a discourse. In this work, we phrase the clustering task as a Minimum Cost Multicut Problem [MP] Solutions to the MP have been shown to provide clusterings that maximize the posterior probability, solely from provided local, pairwise probabilities of two images belonging to the same cluster. Our insights into embedding space differences in combination with the optimal clustering - by definition - advances automated visual frame detection.
arXiv Detail & Related papers (2024-12-02T09:09:47Z)
Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis. We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data. FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z)
Deep Structure and Attention Aware Subspace Clustering [29.967881186297582]
We propose a novel Deep Structure and Attention aware Subspace Clustering (DSASC) We use a vision transformer to extract features, and the extracted features are divided into two parts, structure features, and content features. Our method significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-25T01:19:47Z)
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering [37.15595383168132]
Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer. GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets. Experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
arXiv Detail & Related papers (2023-10-27T03:07:05Z)
CoC-GAN: Employing Context Cluster for Unveiling a New Pathway in Image Generation [12.211795836214112]
We propose a unique image generation process premised on the perspective of converting images into a set of point clouds. Our methodology leverages simple clustering methods named Context Clustering (CoC) to generate images from unordered point sets. We introduce this model with the novel structure as the Context Clustering Generative Adversarial Network (CoC-GAN)
arXiv Detail & Related papers (2023-08-23T01:19:58Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
Adaptively Clustering Neighbor Elements for Image-Text Generation [78.82346492527425]
We propose a novel Transformer-based image-to-text generation model termed as textbfACF. ACF adaptively clusters vision patches into object regions and language words into phrases to implicitly learn object-phrase alignments. Experiment results demonstrate the effectiveness of ACF, which outperforms most SOTA captioning and VQA models.
arXiv Detail & Related papers (2023-01-05T08:37:36Z)
DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering [6.447863458841379]
This study introduces a lightweight Graph Neural Network (GNN) to replace classical clustering methods. Unlike existing methods, our GNN takes both the pair-wise affinities between local image features and the raw features as input. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training an image segmentation GNN.
arXiv Detail & Related papers (2022-12-12T12:31:46Z)
GroupViT: Semantic Segmentation Emerges from Text Supervision [82.02467579704091]
Grouping and recognition are important components of visual scene understanding. We propose a hierarchical Grouping Vision Transformer (GroupViT) GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner.
arXiv Detail & Related papers (2022-02-22T18:56:04Z)
Learning Spatial Context with Graph Neural Network for Multi-Person Pose Grouping [71.59494156155309]
Bottom-up approaches for image-based multi-person pose estimation consist of two stages: keypoint detection and grouping. In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN) The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association.
arXiv Detail & Related papers (2021-04-06T09:21:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.