Contrastive Learning Subspace for Text Clustering
- URL: http://arxiv.org/abs/2408.14119v1
- Date: Mon, 26 Aug 2024 09:08:26 GMT
- Title: Contrastive Learning Subspace for Text Clustering
- Authors: Qian Yong, Chen Chen, Xiabing Zhou,
- Abstract summary: We propose a novel text clustering approach called Subspace Contrastive Learning (SCL)
The proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts.
Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction.
- Score: 4.065026352441705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has been frequently investigated to learn effective representations for text clustering tasks. While existing contrastive learning-based text clustering methods only focus on modeling instance-wise semantic similarity relationships, they ignore contextual information and underlying relationships among all instances that needs to be clustered. In this paper, we propose a novel text clustering approach called Subspace Contrastive Learning (SCL) which models cluster-wise relationships among instances. Specifically, the proposed SCL consists of two main modules: (1) a self-expressive module that constructs virtual positive samples and (2) a contrastive learning module that further learns a discriminative subspace to capture task-specific cluster-wise relationships among texts. Experimental results show that the proposed SCL method not only has achieved superior results on multiple task clustering datasets but also has less complexity in positive sample construction.
Related papers
- Visual Prompt Selection for In-Context Learning Segmentation [77.15684360470152]
In this paper, we focus on rethinking and improving the example selection strategy.
We first demonstrate that ICL-based segmentation models are sensitive to different contexts.
Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation.
arXiv Detail & Related papers (2024-07-14T15:02:54Z) - Text Clustering with LLM Embeddings [0.0]
The effectiveness of text clustering largely depends on the selection of textual embeddings and clustering algorithms.
Recent advancements in large language models (LLMs) have the potential to enhance this task.
Findings indicate that LLM embeddings are superior at capturing subtleties in structured language.
arXiv Detail & Related papers (2024-03-22T11:08:48Z) - CEIL: A General Classification-Enhanced Iterative Learning Framework for
Text Clustering [16.08402937918212]
We propose a novel Classification-Enhanced Iterative Learning framework for short text clustering.
In each iteration, we first adopt a language model to retrieve the initial text representations.
After strict data filtering and aggregation processes, samples with clean category labels are retrieved, which serve as supervision information.
Finally, the updated language model with improved representation ability is used to enhance clustering in the next iteration.
arXiv Detail & Related papers (2023-04-20T14:04:31Z) - ClusTop: An unsupervised and integrated text clustering and topic
extraction framework [3.3073775218038883]
We propose an unsupervised text clustering and topic extraction framework (ClusTop)
Our framework includes four components: enhanced language model training, dimensionality reduction, clustering and topic extraction.
Experiments on two datasets demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2023-01-03T03:26:26Z) - Improving Event Representation via Simultaneous Weakly Supervised
Contrastive Learning and Clustering [31.841780703374955]
We present SWCC: a Simultaneous Weakly supervised Contrastive learning and Clustering framework for event representation learning.
For model training, SWCC learns representations by simultaneously performing weakly supervised contrastive learning and prototype-based clustering.
arXiv Detail & Related papers (2022-03-15T04:12:00Z) - A Proposition-Level Clustering Approach for Multi-Document Summarization [82.4616498914049]
We revisit the clustering approach, grouping together propositions for more precise information alignment.
Our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions.
Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.
arXiv Detail & Related papers (2021-12-16T10:34:22Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Graph Contrastive Clustering [131.67881457114316]
We propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering(GCC) method.
Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features.
On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments.
arXiv Detail & Related papers (2021-04-03T15:32:49Z) - Deep Clustering by Semantic Contrastive Learning [67.28140787010447]
We introduce a novel variant called Semantic Contrastive Learning (SCL)
It explores the characteristics of both conventional contrastive learning and deep clustering.
It can amplify the strengths of contrastive learning and deep clustering in a unified approach.
arXiv Detail & Related papers (2021-03-03T20:20:48Z) - Neural Text Classification by Jointly Learning to Cluster and Align [5.969960391685054]
We extend the neural text clustering approach to text classification tasks by inducing cluster centers via a latent variable model and interacting with distributional word embeddings.
The proposed method jointly learns word clustering centroids and clustering-token alignments, achieving the state of the art results on multiple benchmark datasets.
arXiv Detail & Related papers (2020-11-24T16:07:18Z) - Multidirectional Associative Optimization of Function-Specific Word
Representations [86.87082468226387]
We present a neural framework for learning associations between interrelated groups of words.
Our model induces a joint function-specific word vector space, where vectors of e.g. plausible SVO compositions lie close together.
The model retains information about word group membership even in the joint space, and can thereby effectively be applied to a number of tasks reasoning over the SVO structure.
arXiv Detail & Related papers (2020-05-11T17:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.