Related papers: Shallow decision trees for explainable $k$-means clustering

Shallow decision trees for explainable $k$-means clustering

URL: http://arxiv.org/abs/2112.14718v1
Date: Wed, 29 Dec 2021 18:22:28 GMT
Title: Shallow decision trees for explainable $k$-means clustering
Authors: Eduardo Laber, Lucas Murtinho, Felipe Oliveira
Abstract summary: We propose an efficient algorithm that takes into account metrics related to the depths of the leaves in a decision tree. In experiments on 16 datasets, our algorithm yields better results than decision-tree clustering algorithms.
Score: 1.2891210250935146
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: A number of recent works have employed decision trees for the construction of explainable partitions that aim to minimize the $k$-means cost function. These works, however, largely ignore metrics related to the depths of the leaves in the resulting tree, which is perhaps surprising considering how the explainability of a decision tree depends on these depths. To fill this gap in the literature, we propose an efficient algorithm that takes into account these metrics. In experiments on 16 datasets, our algorithm yields better results than decision-tree clustering algorithms such as the ones presented in \cite{dasgupta2020explainable}, \cite{frost2020exkmc}, \cite{laber2021price} and \cite{DBLP:conf/icml/MakarychevS21}, typically achieving lower or equivalent costs with considerably shallower trees. We also show, through a simple adaptation of existing techniques, that the problem of building explainable partitions induced by binary trees for the $k$-means cost function does not admit an $(1+\epsilon)$-approximation in polynomial time unless $P=NP$, which justifies the quest for approximation algorithms and/or heuristics.

Related papers

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity [22.57063332430197]
We focus on fundamental pruning operations of subtree replacement and raising. While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising. For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
arXiv Detail & Related papers (2025-03-05T15:02:46Z)
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring. We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$. We first design novel efficient algorithms to privately release pair-wise cosine similarities. We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z)
On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z)
Unbiased and Efficient Sampling of Dependency Trees [0.0]
Most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node. Zmigrod et al. have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution. We show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact biased.
arXiv Detail & Related papers (2022-05-25T09:57:28Z)
Active-LATHE: An Active Learning Algorithm for Boosting the Error Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$. Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z)
On Finding the $K$-best Non-projective Dependency Trees [39.5549669100436]
We provide a simplification of the $K$-best spanning tree algorithm of Camerini et al. (1980). We present a novel extension of the algorithm for decoding the $K$-best dependency trees of a graph which are subject to a root constraint.
arXiv Detail & Related papers (2021-06-01T20:23:41Z)
Exact and Approximate Hierarchical Clustering Using A* [51.187990314731344]
We introduce a new approach based on A* search for clustering. We overcome the prohibitively large search space by combining A* with a novel emphtrellis data structure. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks.
arXiv Detail & Related papers (2021-04-14T18:15:27Z)
SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise. Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure. We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z)
On the price of explainability for some clustering problems [1.52292571922932]
We provide upper and lower bounds for a natural model where explainability is achieved via decision trees. For the $k$-means and $k$-medians problems our upper bounds improve those obtained by [Moshkovitz et. al, ICML 20] for low dimensions. Another contribution is a simple and efficient algorithm for building explainable clusterings for the $k$-means problem.
arXiv Detail & Related papers (2021-01-05T15:08:25Z)
On $\ell_p$-norm Robustness of Ensemble Stumps and Trees [83.81523991945018]
We develop an efficient programming based algorithm for sound verification of ensemble stumps. We demonstrate the first certified defense method for training ensemble stumps and trees with respect to $ell_p$ norm perturbations.
arXiv Detail & Related papers (2020-08-20T03:42:40Z)
Explainable $k$-Means and $k$-Medians Clustering [25.513261099927163]
We consider using a small decision tree to partition a data set into clusters, so that clusters can be characterized in a straightforward manner. We show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost. We design an efficient algorithm that produces explainable clusters using a tree with $k$ leaves.
arXiv Detail & Related papers (2020-02-28T04:21:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.