Optimal Decision Trees For Interpretable Clustering with Constraints
(Extended Version)
- URL: http://arxiv.org/abs/2301.12671v2
- Date: Tue, 16 May 2023 14:24:36 GMT
- Title: Optimal Decision Trees For Interpretable Clustering with Constraints
(Extended Version)
- Authors: Pouya Shati, Eldan Cohen, Sheila McIlraith
- Abstract summary: Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints.
We present a novel SAT-based framework for interpretable clustering that supports clustering constraints.
We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints.
- Score: 7.799182201815762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Constrained clustering is a semi-supervised task that employs a limited
amount of labelled data, formulated as constraints, to incorporate
domain-specific knowledge and to significantly improve clustering accuracy.
Previous work has considered exact optimization formulations that can guarantee
optimal clustering while satisfying all constraints, however these approaches
lack interpretability. Recently, decision-trees have been used to produce
inherently interpretable clustering solutions, however existing approaches do
not support clustering constraints and do not provide strong theoretical
guarantees on solution quality. In this work, we present a novel SAT-based
framework for interpretable clustering that supports clustering constraints and
that also provides strong theoretical guarantees on solution quality. We also
present new insight into the trade-off between interpretability and
satisfaction of such user-provided constraints. Our framework is the first
approach for interpretable and constrained clustering. Experiments with a range
of real-world and synthetic datasets demonstrate that our approach can produce
high-quality and interpretable constrained clustering solutions.
Related papers
- Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - ConstraintMatch for Semi-constrained Clustering [32.92933231199262]
Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine.
We propose a semi-supervised context whereby a large amount of textitunconstrained data is available alongside a smaller set of constraints, and propose textitConstraintMatch to leverage such unconstrained data.
arXiv Detail & Related papers (2023-11-26T19:31:52Z) - Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked
Taxonomy and Future Research Directions [2.5957372084704238]
The research area of constrained clustering has grown significantly over the years.
No unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks.
This study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering.
arXiv Detail & Related papers (2023-02-28T17:46:31Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - Clustering to the Fewest Clusters Under Intra-Cluster Dissimilarity
Constraints [0.0]
equiwide clustering relies neither on density nor on a predefined number of expected classes, but on a dissimilarity threshold.
We review and evaluate suitable clustering algorithms to identify trade-offs between the various practical solutions for this clustering problem.
arXiv Detail & Related papers (2021-09-28T12:02:18Z) - Deep Fair Discriminative Clustering [24.237000220172906]
We study a general notion of group-level fairness for binary and multi-state protected status variables (PSVs)
We propose a refinement learning algorithm to combine the clustering goal with the fairness objective to learn fair clusters adaptively.
Our framework shows promising results for novel clustering tasks including flexible fairness constraints, multi-state PSVs and predictive clustering.
arXiv Detail & Related papers (2021-05-28T23:50:48Z) - Fairness, Semi-Supervised Learning, and More: A General Framework for
Clustering with Stochastic Pairwise Constraints [32.19047459493177]
We introduce a novel family of emphstochastic pairwise constraints, which we incorporate into several essential clustering objectives.
We show that these constraints can succinctly model an intriguing collection of applications, including emphIndividual Fairness in clustering and emphMust-link constraints in semi-supervised learning.
arXiv Detail & Related papers (2021-03-02T20:27:58Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.