GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery
- URL: http://arxiv.org/abs/2411.02074v2
- Date: Sat, 16 Nov 2024 09:05:56 GMT
- Title: GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery
- Authors: Bhupendra Solanki, Ashwin Nair, Mainak Singha, Souradeep Mukhopadhyay, Ankit Jha, Biplab Banerjee,
- Abstract summary: We introduce GraphVL, a novel approach for vision-language modeling in Generalized Category Discovery (GCD)
Our method integrates a graph convolutional network (GCN) with CLIP's text encoder to preserve class neighborhood structure.
Our experiments on seven benchmark datasets consistently demonstrate the superiority of GraphVL when integrated with the CLIP backbone.
- Score: 11.006059998223908
- License:
- Abstract: Generalized Category Discovery (GCD) aims to cluster unlabeled images into known and novel categories using labeled images from known classes. To address the challenge of transferring features from known to unknown classes while mitigating model bias, we introduce GraphVL, a novel approach for vision-language modeling in GCD, leveraging CLIP. Our method integrates a graph convolutional network (GCN) with CLIP's text encoder to preserve class neighborhood structure. We also employ a lightweight visual projector for image data, ensuring discriminative features through margin-based contrastive losses for image-text mapping. This neighborhood preservation criterion effectively regulates the semantic space, making it less sensitive to known classes. Additionally, we learn textual prompts from known classes and align them to create a more contextually meaningful semantic feature space for the GCN layer using a contextual similarity loss. Finally, we represent unlabeled samples based on their semantic distance to class prompts from the GCN, enabling semi-supervised clustering for class discovery and minimizing errors. Our experiments on seven benchmark datasets consistently demonstrate the superiority of GraphVL when integrated with the CLIP backbone.
Related papers
- Deep Contrastive Graph Learning with Clustering-Oriented Guidance [61.103996105756394]
Graph Convolutional Network (GCN) has exhibited remarkable potential in improving graph-based clustering.
Models estimate an initial graph beforehand to apply GCN.
Deep Contrastive Graph Learning (DCGL) model is proposed for general data clustering.
arXiv Detail & Related papers (2024-02-25T07:03:37Z) - Local Structure-aware Graph Contrastive Representation Learning [12.554113138406688]
We propose a Local Structure-aware Graph Contrastive representation Learning method (LS-GCL) to model the structural information of nodes from multiple views.
For the local view, the semantic subgraph of each target node is input into a shared GNN encoder to obtain the target node embeddings at the subgraph-level.
For the global view, considering the original graph preserves indispensable semantic information of nodes, we leverage the shared GNN encoder to learn the target node embeddings at the global graph-level.
arXiv Detail & Related papers (2023-08-07T03:23:46Z) - Learning Semi-supervised Gaussian Mixture Models for Generalized
Category Discovery [36.01459228175808]
We propose an EM-like framework that alternates between representation learning and class number estimation.
We evaluate our framework on both generic image classification datasets and challenging fine-grained object recognition datasets.
arXiv Detail & Related papers (2023-05-10T13:47:38Z) - GLCC: A General Framework for Graph-level Clustering [5.069852282550117]
This paper studies the problem of graph-level clustering, which is a novel yet challenging task.
We propose a general graph-level clustering framework named Graph-Level Contrastive Clustering (GLCC)
Experiments on a range of well-known datasets demonstrate the superiority of our proposed GLCC over competitive baselines.
arXiv Detail & Related papers (2022-10-21T11:08:10Z) - No Token Left Behind: Explainability-Aided Image Classification and
Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input.
Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z) - Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities.
Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z) - Weakly-Supervised Image Semantic Segmentation Using Graph Convolutional
Networks [9.066817971329899]
weakly-supervised image semantic segmentation based on image-level class labels.
One common approach to this task is to propagate the activation scores of Class Activation Maps (CAMs) using a random-walk mechanism.
We propose a Graph Convolutional Network (GCN)-based feature propagation framework.
arXiv Detail & Related papers (2021-03-31T02:05:01Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z) - Hierarchical Image Classification using Entailment Cone Embeddings [68.82490011036263]
We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier.
We empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance.
arXiv Detail & Related papers (2020-04-02T10:22:02Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.