FUTURE: Flexible Unlearning for Tree Ensemble
- URL: http://arxiv.org/abs/2508.21181v1
- Date: Thu, 28 Aug 2025 19:45:36 GMT
- Title: FUTURE: Flexible Unlearning for Tree Ensemble
- Authors: Ziheng Chen, Jin Huang, Jiali Cheng, Yuchan Guo, Mengjie Wang, Lalitesh Morishetti, Kaushiki Nag, Hadi Amiri,
- Abstract summary: Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains.<n>With increasing emphasis on data privacy and the textitright to be forgotten, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information.<n>We propose FUTURE, a novel unlearning algorithm for tree ensembles.
- Score: 23.336396189756574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tree ensembles are widely recognized for their effectiveness in classification tasks, achieving state-of-the-art performance across diverse domains, including bioinformatics, finance, and medical diagnosis. With increasing emphasis on data privacy and the \textit{right to be forgotten}, several unlearning algorithms have been proposed to enable tree ensembles to forget sensitive information. However, existing methods are often tailored to a particular model or rely on the discrete tree structure, making them difficult to generalize to complex ensembles and inefficient for large-scale datasets. To address these limitations, we propose FUTURE, a novel unlearning algorithm for tree ensembles. Specifically, we formulate the problem of forgetting samples as a gradient-based optimization task. In order to accommodate non-differentiability of tree ensembles, we adopt the probabilistic model approximations within the optimization framework. This enables end-to-end unlearning in an effective and efficient manner. Extensive experiments on real-world datasets show that FUTURE yields significant and successful unlearning performance.
Related papers
- RO-FIGS: Efficient and Expressive Tree-Based Ensembles for Tabular Data [10.610270769561811]
Tree-based models are robust to uninformative features and can accurately capture non-smooth, complex decision boundaries.<n>We propose Random oblique Fast Interpretable Greedy-Tree Sums (RO-FIGS)<n>RO-FIGS builds on Fast Interpretable Greedy-Tree Sums, and extends it by learning trees with oblique or multivariate splits.<n>We evaluate RO-FIGS on 22 real-world datasets, demonstrating superior performance and much smaller models over other tree- and neural network-based methods.
arXiv Detail & Related papers (2025-04-09T14:35:24Z) - Learning Decision Trees as Amortized Structure Inference [59.65621207449269]
We propose a hybrid amortized structure inference approach to learn predictive decision tree ensembles given data.<n>We show that our approach, DT-GFN, outperforms state-of-the-art decision tree and deep learning methods on standard classification benchmarks.
arXiv Detail & Related papers (2025-03-10T07:05:07Z) - Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.<n>We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.<n> Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z) - BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving [11.596474985695679]
We release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process.<n>We propose BPP-Search, an algorithm that integrates reinforcement learning into a tree-of-thought structure.
arXiv Detail & Related papers (2024-11-26T13:05:53Z) - A Closer Look at Deep Learning Methods on Tabular Datasets [78.61845513154502]
We present an extensive study on TALENT, a collection of 300+ datasets spanning broad ranges of size.<n>Our evaluation shows that ensembling benefits both tree-based and neural approaches.
arXiv Detail & Related papers (2024-07-01T04:24:07Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Learning accurate and interpretable tree-based models [27.203303726977616]
We develop approaches to design tree-based learning algorithms given repeated access to data from the same domain.<n>We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria.<n>We extend our results to tuning popular tree-based ensembles, including random forests and gradient-boosted trees.
arXiv Detail & Related papers (2024-05-24T20:10:10Z) - Improving Data Quality with Training Dynamics of Gradient Boosting
Decision Trees [1.5605040219256345]
We propose a method based on metrics from training dynamics of Gradient Boosting Decision Trees (GBDTs) to assess the behavior of each training example.
We show results on detecting noisy labels in order clean datasets, improving models' metrics in synthetic and real public datasets, as well as on a industry case in which we deployed a model based on the proposed solution.
arXiv Detail & Related papers (2022-10-20T15:02:49Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - High-Dimensional Bayesian Optimization via Tree-Structured Additive
Models [40.497123136157946]
We consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to model a high-dimensional target function.
Our goal is to lower the computational resources required and facilitate faster model learning.
We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets.
arXiv Detail & Related papers (2020-12-24T03:56:44Z) - Deep Reinforcement Learning of Graph Matching [63.469961545293756]
Graph matching (GM) under node and pairwise constraints has been a building block in areas from optimization to computer vision.
We present a reinforcement learning solver for GM i.e. RGM that seeks the node correspondence between pairwise graphs.
Our method differs from the previous deep graph matching model in the sense that they are focused on the front-end feature extraction and affinity function learning.
arXiv Detail & Related papers (2020-12-16T13:48:48Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.