Using Constraints to Discover Sparse and Alternative Subgroup Descriptions
- URL: http://arxiv.org/abs/2406.01411v1
- Date: Mon, 3 Jun 2024 15:10:01 GMT
- Title: Using Constraints to Discover Sparse and Alternative Subgroup Descriptions
- Authors: Jakob Bach,
- Abstract summary: Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset.
We focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse.
Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset. Using constraints in subgroup discovery can enhance interpretability even further. In this article, we focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse. Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features. We describe how to integrate both constraint types into heuristic subgroup-discovery methods. Further, we propose a novel Satisfiability Modulo Theories (SMT) formulation of subgroup discovery as a white-box optimization problem, which allows solver-based search for subgroups and is open to a variety of constraint types. Additionally, we prove that both constraint types lead to an NP-hard optimization problem. Finally, we employ 27 binary-classification datasets to compare heuristic and solver-based search for unconstrained and constrained subgroup discovery. We observe that heuristic search methods often yield high-quality subgroups within a short runtime, also in scenarios with constraints.
Related papers
- Clustered Orienteering Problem with Subgroups [6.961946145048321]
Clustered Orienteering Problem with Subgroups (COPS)
We show that our new formulation has the ability to model and solve two previous well-known variants.
arXiv Detail & Related papers (2023-12-26T18:28:25Z) - Composite Feature Selection using Deep Ensembles [130.72015919510605]
We investigate the problem of discovering groups of predictive features without predefined grouping.
We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups.
We propose a new metric to measure similarity between discovered groups and the ground truth.
arXiv Detail & Related papers (2022-11-01T17:49:40Z) - Cluster Explanation via Polyhedral Descriptions [0.0]
Clustering is an unsupervised learning problem that aims to partition unlabelled data points into groups with similar features.
Traditional clustering algorithms provide limited insight into the groups they find as their main focus is accuracy and not the interpretability of the group assignments.
We introduce a new approach to explain clusters by constructing polyhedra around each cluster while minimizing either the complexity of the resulting polyhedra or the number of features used in the description.
arXiv Detail & Related papers (2022-10-17T07:26:44Z) - Interpretable Clustering via Multi-Polytope Machines [12.69310440882225]
We propose a novel approach for interpretable clustering that both clusters data points and constructs polytopes around the discovered clusters to explain them.
We benchmark our approach on a suite of synthetic and real world clustering problems, where our algorithm outperforms state of the art interpretable and non-interpretable clustering algorithms.
arXiv Detail & Related papers (2021-12-10T16:36:32Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z) - Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem.
We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z) - Certifiably Polynomial Algorithm for Best Group Subset Selection [0.9667631210393929]
Best group subset selection aims to choose a small part of non-overlapping groups to achieve the best interpretability on the response variable.
We propose a group-splicing algorithm that iteratively detects effective groups and excludes the helpless ones.
We demonstrate the efficiency and accuracy of our proposal by comparing state-of-the-art algorithms on both synthetic and real-world datasets.
arXiv Detail & Related papers (2021-04-23T03:05:11Z) - Robust subgroup discovery [0.2578242050187029]
We formalize the problem of optimal robust subgroup discovery using the Minimum Description Length principle.
We propose RSD, a greedy greedy that finds good subgroup lists and guarantees that the most significant subgroup is added in each iteration.
We empirically show on 54 datasets that RSD outperforms previous subgroup set discovery methods in terms of quality and subgroup list size.
arXiv Detail & Related papers (2021-03-25T09:04:13Z) - Unsupervised Multi-view Clustering by Squeezing Hybrid Knowledge from
Cross View and Each View [68.88732535086338]
This paper proposes a new multi-view clustering method, low-rank subspace multi-view clustering based on adaptive graph regularization.
Experimental results for five widely used multi-view benchmarks show that our proposed algorithm surpasses other state-of-the-art methods by a clear margin.
arXiv Detail & Related papers (2020-08-23T08:25:06Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.