Active Learning for Decision Trees with Provable Guarantees
- URL: http://arxiv.org/abs/2601.20775v1
- Date: Wed, 28 Jan 2026 17:02:25 GMT
- Title: Active Learning for Decision Trees with Provable Guarantees
- Authors: Arshia Soltani Moakhar, Tanapoom Laoaron, Faraz Ghahremani, Kiarash Banihashem, MohammadTaghi Hajiaghayi,
- Abstract summary: We provide the first analysis of the disagreement coefficient for decision trees.<n>We present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee.<n>We establish a label complexity lower bound, showing our algorithm's dependence on the error tolerance $$ is close to optimal.
- Score: 21.408034432864614
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1+ε)$-approximate classifier. By combining these results, we design an active learning algorithm for decision trees that uses only a polylogarithmic number of label queries in the dataset size, under the stated assumptions. Finally, we establish a label complexity lower bound, showing our algorithm's dependence on the error tolerance $ε$ is close to optimal.
Related papers
- On Improving Neurosymbolic Learning by Exploiting the Representation Space [54.16389421332958]
We study the problem of learning neural classifiers in a neurosymbolic setting where the hidden gold labels of input instances must satisfy a logical formula.<n>One challenge is that the space of label combinations can grow exponentially, making learning difficult.<n>We propose a technique that prunes this space by exploiting the intuition that instances with similar latent representations are likely to share the same label.
arXiv Detail & Related papers (2026-02-08T13:56:47Z) - Efficient and Scalable Neural Symbolic Search for Knowledge Graph Complex Query Answering [50.1887329564127]
We propose an efficient and scalable symbolic search framework for complex queries.<n>Our framework reduces the computational load of symbolic methods by 90% while maintaining nearly the same performance.
arXiv Detail & Related papers (2025-05-13T01:24:09Z) - Optimal Decision Tree Pruning Revisited: Algorithms and Complexity [22.57063332430197]
We focus on fundamental pruning operations of subtree replacement and raising.<n>While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising.<n>For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
arXiv Detail & Related papers (2025-03-05T15:02:46Z) - Bidirectional Logits Tree: Pursuing Granularity Reconcilement in Fine-Grained Classification [89.20477310885731]
This paper addresses the challenge of Granularity Competition in fine-grained classification tasks.<n>Existing approaches typically develop independent hierarchy-aware models based on shared features extracted from a common base encoder.<n>We propose a novel framework called the Bidirectional Logits Tree (BiLT) for Granularity Reconcilement.
arXiv Detail & Related papers (2024-12-17T10:42:19Z) - Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT.<n>Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition.<n> Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z) - TreeDQN: Learning to minimize Branch-and-Bound tree [78.52895577861327]
Branch-and-Bound is a convenient approach to solving optimization tasks in the form of Mixed Linear Programs.
The efficiency of the solver depends on the branchning used to select a variable for splitting.
We propose a reinforcement learning method that can efficiently learn the branching.
arXiv Detail & Related papers (2023-06-09T14:01:26Z) - Tree Learning: Optimal Algorithms and Sample Complexity [10.638365461509]
We study the problem of learning a hierarchical tree representation of data from labeled samples taken from an arbitrary distribution.
We present optimal sample bounds for this problem in several learning settings, including (agnostic) learning and online learning.
arXiv Detail & Related papers (2023-02-09T08:35:17Z) - Exact learning and test theory [0.0]
We study arbitrary infinite binary information systems each of which consists of an infinite set of elements.
We consider the notion of a problem over information system, which is described by a finite number of attributes.
arXiv Detail & Related papers (2022-01-12T15:10:22Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Measure Inducing Classification and Regression Trees for Functional Data [0.0]
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis.
This is achieved by learning a weighted functional $L2$ space by means of constrained convex optimization.
arXiv Detail & Related papers (2020-10-30T18:49:53Z) - Probabilistic Label Trees for Extreme Multi-label Classification [8.347190888362194]
Problems of extreme multi-label classification (XMLC) are efficiently handled by organizing labels as a tree.
PLTs can be treated as a generalization of hierarchical softmax for multi-label problems.
We introduce the model and discuss training and inference procedures and their computational costs.
We prove a specific equivalence between the fully online algorithm and an algorithm with a tree structure given in advance.
arXiv Detail & Related papers (2020-09-23T15:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.