Related papers: Vec2GC -- A Graph Based Clustering Method for Text Representations

Related papers

Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering [9.929301228994095]
This paper proposes a novel approach that integrates Named Entity Recognition (NER) and Large Language Models (LLMs) embeddings within a graph-based framework for document clustering. The method builds a graph with nodes representing documents and edges weighted by named entity similarity, optimized using a graph-convolutional network (GCN) Experimental results indicate that our approach outperforms conventional co-occurrence-based methods in clustering, notably for documents rich in named entities.
arXiv Detail & Related papers (2024-12-19T14:03:22Z)
Information-Theoretic Generative Clustering of Documents [24.56214029342293]
We present generative clustering (GC) for clustering a set of documents, $mathrmX$. Because large language models (LLMs) provide probability distributions, the similarity between two documents can be rigorously defined. We show GC achieves the state-of-the-art performance, outperforming any previous clustering method often by a large margin.
arXiv Detail & Related papers (2024-12-18T06:21:21Z)
Generalized Category Discovery with Clustering Assignment Consistency [56.92546133591019]
Generalized category discovery (GCD) is a recently proposed open-world task. We propose a co-training-based framework that encourages clustering consistency. Our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets.
arXiv Detail & Related papers (2023-10-30T00:32:47Z)
Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering. In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework. In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z)
CLIP-GCD: Simple Language Guided Generalized Category Discovery [21.778676607030253]
Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the labeled data, followed by simple clustering methods. We propose to leverage multi-modal (vision and language) models, in two complementary ways.
arXiv Detail & Related papers (2023-05-17T17:55:33Z)
Unified Line and Paragraph Detection by Graph Convolutional Networks [5.298581058536571]
We formulate the task of detecting lines and paragraphs in a document into a unified two-level clustering problem. We use a graph convolutional network to predict the relations between text detection boxes and then build both levels of clusters from these predictions. Experimentally, we demonstrate that the unified approach can be highly efficient while still achieving state-of-the-art quality for detecting paragraphs in public benchmarks and real-world images.
arXiv Detail & Related papers (2022-03-17T22:27:12Z)
Self-supervised Contrastive Attributed Graph Clustering [110.52694943592974]
We propose a novel attributed graph clustering network, namely Self-supervised Contrastive Attributed Graph Clustering (SCAGC) In SCAGC, by leveraging inaccurate clustering labels, a self-supervised contrastive loss, are designed for node representation learning. For the OOS nodes, SCAGC can directly calculate their clustering labels.
arXiv Detail & Related papers (2021-10-15T03:25:28Z)
Learning Hierarchical Graph Neural Networks for Image Clustering [81.5841862489509]
We propose a hierarchical graph neural network (GNN) model that learns how to cluster a set of images into an unknown number of identities. Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.
arXiv Detail & Related papers (2021-07-03T01:28:42Z)
DocSCAN: Unsupervised Text Classification via Learning from Neighbors [2.2082422928825145]
We introduce DocSCAN, a completely unsupervised text classification approach using Semantic Clustering by Adopting Nearest-Neighbors (SCAN) For each document, we obtain semantically informative vectors from a large pre-trained language model. Similar documents have proximate vectors, so neighbors in the representation space tend to share topic labels. Our learnable clustering approach uses pairs of neighboring datapoints as a weak learning signal. The proposed approach learns to assign classes to the whole dataset without provided ground-truth labels.
arXiv Detail & Related papers (2021-05-09T21:20:31Z)
Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed. We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
Structured Graph Learning for Clustering and Semi-supervised Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data. Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure. Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z)
Enhancement of Short Text Clustering by Iterative Classification [0.0]
iterative classification applies outlier removal to obtain outlier-free clusters. It trains a classification algorithm using the non-outliers based on their cluster distributions. By repeating this several times, we obtain a much improved clustering of texts.
arXiv Detail & Related papers (2020-01-31T02:12:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.