Goal-Driven Explainable Clustering via Language Descriptions
- URL: http://arxiv.org/abs/2305.13749v2
- Date: Mon, 13 Nov 2023 18:27:21 GMT
- Title: Goal-Driven Explainable Clustering via Language Descriptions
- Authors: Zihan Wang, Jingbo Shang, Ruiqi Zhong
- Abstract summary: We propose a new task formulation, "Goal-Driven Clustering with Explanations" (GoalEx)
GoalEx represents both the goal and the explanations as free-form language descriptions.
Our method produces more accurate and goal-related explanations than prior methods.
- Score: 50.980832345025334
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised clustering is widely used to explore large corpora, but existing
formulations neither consider the users' goals nor explain clusters' meanings.
We propose a new task formulation, "Goal-Driven Clustering with Explanations"
(GoalEx), which represents both the goal and the explanations as free-form
language descriptions. For example, to categorize the errors made by a
summarization system, the input to GoalEx is a corpus of annotator-written
comments for system-generated summaries and a goal description "cluster the
comments based on why the annotators think the summary is imperfect.''; the
outputs are text clusters each with an explanation ("this cluster mentions that
the summary misses important context information."), which relates to the goal
and precisely explain which comments should (not) belong to a cluster. To
tackle GoalEx, we prompt a language model with "[corpus subset] + [goal] +
Brainstorm a list of explanations each representing a cluster."; then we
classify whether each sample belongs to a cluster based on its explanation;
finally, we use integer linear programming to select a subset of candidate
clusters to cover most samples while minimizing overlaps. Under both automatic
and human evaluation on corpora with or without labels, our method produces
more accurate and goal-related explanations than prior methods. We release our
data and implementation at https://github.com/ZihanWangKi/GoalEx.
Related papers
- OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation [69.37484603556307]
Un Semantic segmenting (USS) involves segmenting images without relying on predefined labels.
We introduce a novel approach called Optimally Matched Hierarchy (OMH) to simultaneously address the above issues.
Our OMH yields better unsupervised segmentation performance compared to existing USS methods.
arXiv Detail & Related papers (2024-03-11T09:46:41Z) - Hierarchical Indexing for Retrieval-Augmented Opinion Summarization [60.5923941324953]
We propose a method for unsupervised abstractive opinion summarization that combines the attributability and scalability of extractive approaches with the coherence and fluency of Large Language Models (LLMs)
Our method, HIRO, learns an index structure that maps sentences to a path through a semantically organized discrete hierarchy.
At inference time, we populate the index and use it to identify and retrieve clusters of sentences containing popular opinions from input reviews.
arXiv Detail & Related papers (2024-03-01T10:38:07Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Reinforcement Graph Clustering with Unknown Cluster Number [91.4861135742095]
We propose a new deep graph clustering method termed Reinforcement Graph Clustering.
In our proposed method, cluster number determination and unsupervised representation learning are unified into a uniform framework.
In order to conduct feedback actions, the clustering-oriented reward function is proposed to enhance the cohesion of the same clusters and separate the different clusters.
arXiv Detail & Related papers (2023-08-13T18:12:28Z) - Interpretable Deep Clustering for Tabular Data [7.972599673048582]
Clustering is a fundamental learning task widely used in data analysis.
We propose a new deep-learning framework that predicts interpretable cluster assignments at the instance and cluster levels.
We show that the proposed method can reliably predict cluster assignments in biological, text, image, and physics datasets.
arXiv Detail & Related papers (2023-06-07T21:08:09Z) - Cluster Explanation via Polyhedral Descriptions [0.0]
Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features.
Traditional clustering algorithms provide limited insight into the groups they find as their main focus is accuracy and not the interpretability of the group assignments.
We introduce a new approach to explain clusters by constructing polyhedra around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description.
arXiv Detail & Related papers (2022-10-17T07:26:44Z) - Providing Insights for Open-Response Surveys via End-to-End
Context-Aware Clustering [2.6094411360258185]
In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data.
Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors.
Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.
arXiv Detail & Related papers (2022-03-02T18:24:10Z) - You Never Cluster Alone [150.94921340034688]
We extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation.
We define a set of categorical variables as clustering assignment confidence, which links the instance-level learning track with the cluster-level one.
By reparametrizing the assignment variables, TCC is trained end-to-end, requiring no alternating steps.
arXiv Detail & Related papers (2021-06-03T14:59:59Z) - Deep Descriptive Clustering [24.237000220172906]
This paper explores a novel setting for performing clustering on complex data while simultaneously generating explanations using interpretable tags.
We form good clusters by maximizing the mutual information between empirical distribution on the inputs and the induced clustering labels for clustering objectives.
Experimental results on public data demonstrate that our model outperforms competitive baselines in clustering performance.
arXiv Detail & Related papers (2021-05-24T21:40:16Z) - Open Intent Discovery through Unsupervised Semantic Clustering and
Dependency Parsing [44.99113692679489]
This paper proposes an unsupervised two-stage approach to discover intents and generate intent labels automatically from a collection of unlabeled utterances.
We empirically show that the proposed unsupervised approach can generate meaningful intent labels automatically and achieves high precision and recall in utterance clustering and intent discovery.
arXiv Detail & Related papers (2021-04-25T09:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.