GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification
- URL: http://arxiv.org/abs/2501.13743v1
- Date: Thu, 23 Jan 2025 15:18:22 GMT
- Title: GPT-HTree: A Decision Tree Framework Integrating Hierarchical Clustering and Large Language Models for Explainable Classification
- Authors: Te Pei, Fuat Alican, Aaron Ontoyin Yin, Yigit Ihlamur,
- Abstract summary: GPT-HTree is a framework combining hierarchical clustering, decision trees, and large language models (LLMs)<n>LLMs enhance the framework by generating human-readable cluster descriptions, bridging quantitative analysis with actionable insights.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces GPT-HTree, a framework combining hierarchical clustering, decision trees, and large language models (LLMs) to address this challenge. By leveraging hierarchical clustering to segment individuals based on salient features, resampling techniques to balance class distributions, and decision trees to tailor classification paths within each cluster, GPT-HTree ensures both accuracy and interpretability. LLMs enhance the framework by generating human-readable cluster descriptions, bridging quantitative analysis with actionable insights.
Related papers
- Learning Order Forest for Qualitative-Attribute Data Clustering [52.612779710298526]
This paper discovers a tree-like distance structure to flexibly represent the local order relationship among intra-attribute qualitative values.<n>A joint learning mechanism is proposed to iteratively obtain more appropriate tree structures and clusters.<n>Experiments demonstrate that the joint learning adapts the forest to the clustering task to yield accurate results.
arXiv Detail & Related papers (2026-03-03T07:49:50Z) - From Tags to Trees: Structuring Fine-Grained Knowledge for Controllable Data Selection in LLM Instruction Tuning [31.186300383302708]
Tree-aware Aligned Global Sampling (TAGS) is a unified framework that leverages a knowledge tree built from fine-grained tags.<n>Our controllable sampling strategy maximizes tree-level information gain and enforces leaf-level alignment via KL-divergence for specific domains.
arXiv Detail & Related papers (2026-01-20T14:06:51Z) - TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL [7.149629501486536]
Reinforcement learning with group-based objectives is a common framework for aligning large language models on complex reasoning tasks.<n>Standard GRPO treats each rollout trajectory as an independent flat sequence and assigns a single sequence-level advantage to all tokens.<n>We introduce TreeAdv, which makes the tree structure of group rollouts explicit for both exploration and advantage assignment.
arXiv Detail & Related papers (2026-01-07T08:42:14Z) - Decomposing Visual Classification: Assessing Tree-Based Reasoning in VLMs [1.4231678631753704]
Vision language models (VLMs) excel at zero-shot visual classification, but their performance on fine-grained tasks and large hierarchical label spaces is understudied.<n>This paper investigates whether structured, tree-based reasoning can enhance VLM performance.
arXiv Detail & Related papers (2025-09-10T13:08:03Z) - Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs [51.13363550716544]
Deep graph clustering is an unsupervised task aimed at partitioning nodes with incomplete attributes into distinct clusters.<n>Existing imputation methods for attribute-missing graphs often fail to account for the varying amounts of information available across node neighborhoods.<n>We propose Divide-Then-Rule Graph Completion (DTRGC) to address this issue.
arXiv Detail & Related papers (2025-07-12T03:33:19Z) - HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization [0.0]
HERCULES is an algorithm and Python package designed for hierarchical k-means clustering of diverse data types.<n>It generates semantically rich titles and descriptions for clusters at each level of the hierarchy.<n>An interactive visualization tool facilitates thorough analysis and understanding of the clustering results.
arXiv Detail & Related papers (2025-06-24T20:22:00Z) - ODTE -- An ensemble of multi-class SVM-based oblique decision trees [0.7182449176083623]
ODTE is a new ensemble that uses oblique decision trees as base classifiers.
We introduce STree, the base algorithm for growing oblique decision trees.
ODTE ranks consistently above its competitors.
arXiv Detail & Related papers (2024-11-20T14:58:32Z) - scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM [0.8452349885923507]
We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM)
GHSOM is applied to cluster samples in a hierarchical structure such that the self-growth structure of clusters satisfies the required variations between and within.
We present two innovative visualization tools: Cluster Feature Map and Cluster Distribution Map.
arXiv Detail & Related papers (2024-07-24T04:01:09Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - CueGCL: Cluster-aware Personalized Self-Training for Unsupervised Graph Contrastive Learning [49.88192702588169]
We propose a Cluster-aware Graph Contrastive Learning Framework (CueGCL) to jointly learn clustering results and node representations.<n>Specifically, we design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise cluster-level personalized information.<n>We theoretically demonstrate the effectiveness of our model, showing it yields an embedding space with a significantly discernible cluster structure.
arXiv Detail & Related papers (2023-11-18T13:45:21Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Hierarchical clustering by aggregating representatives in
sub-minimum-spanning-trees [5.877624540482919]
We propose a novel hierarchical clustering algorithm, in which, while building the clustering dendrogram, we can effectively detect the representative point.
Under our analysis, the proposed algorithm has O(nlogn) time-complexity and O(logn) space-complexity, indicating that it has the scalability in handling massive data.
arXiv Detail & Related papers (2021-11-11T07:36:55Z) - Attention-driven Graph Clustering Network [49.040136530379094]
We propose a novel deep clustering method named Attention-driven Graph Clustering Network (AGCN)
AGCN exploits a heterogeneous-wise fusion module to dynamically fuse the node attribute feature and the topological graph feature.
AGCN can jointly perform feature learning and cluster assignment in an unsupervised fashion.
arXiv Detail & Related papers (2021-08-12T02:30:38Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z) - Interactive Steering of Hierarchical Clustering [30.371250297444703]
We present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users.
The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven)
To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies.
arXiv Detail & Related papers (2020-09-21T05:26:07Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Scalable Hierarchical Clustering with Tree Grafting [66.68869706310208]
Grinch is a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions.
Grinch is motivated by a new notion of separability for clustering with linkage functions.
arXiv Detail & Related papers (2019-12-31T20:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.