Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance
- URL: http://arxiv.org/abs/2409.12788v1
- Date: Thu, 19 Sep 2024 13:55:29 GMT
- Title: Optimal or Greedy Decision Trees? Revisiting their Objectives, Tuning, and Performance
- Authors: Jacobus G. M. van der Linden, Daniël Vos, Mathijs M. de Weerdt, Sicco Verwer, Emir Demirović,
- Abstract summary: Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly.
We identify two relatively unexplored aspects of ODTs: the objective function used in training trees and tuning techniques.
The value of optimal methods is not well understood yet, as the literature provides conflicting results.
- Score: 9.274054218991528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decision trees are traditionally trained using greedy heuristics that locally optimize an impurity or information metric. Recently there has been a surge of interest in optimal decision tree (ODT) methods that globally optimize accuracy directly. We identify two relatively unexplored aspects of ODTs: the objective function used in training trees and tuning techniques. Additionally, the value of optimal methods is not well understood yet, as the literature provides conflicting results, with some demonstrating superior out-of-sample performance of ODTs over greedy approaches, while others show the exact opposite. In this paper, we address these three questions: what objective to optimize in ODTs; how to tune ODTs; and how do optimal and greedy methods compare? Our experimental evaluation examines 13 objective functions, including four novel objectives resulting from our analysis, seven tuning methods, and six claims from the literature on optimal and greedy methods on 165 real and synthetic data sets. Through our analysis, both conceptually and experimentally, we discover new non-concave objectives, highlight the importance of proper tuning, support and refute several claims from the literature, and provide clear recommendations for researchers and practitioners on the usage of greedy and optimal methods, and code for future comparisons.
Related papers
- Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness [27.43137305486112]
We propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss.
The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods to achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-09-26T12:37:26Z) - Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring [16.38771834692938]
We propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with black-box scoring systems.
We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree.
We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data.
arXiv Detail & Related papers (2024-06-28T14:33:05Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Visualizing the Relationship Between Encoded Linguistic Information and
Task Performance [53.223789395577796]
We study the dynamic relationship between the encoded linguistic information and task performance from the viewpoint of Pareto Optimality.
We conduct experiments on two popular NLP tasks, i.e., machine translation and language modeling, and investigate the relationship between several kinds of linguistic information and task performances.
Our empirical findings suggest that some syntactic information is helpful for NLP tasks whereas encoding more syntactic information does not necessarily lead to better performance.
arXiv Detail & Related papers (2022-03-29T19:03:10Z) - Optimal Policy Trees [0.6015898117103068]
We propose an approach for learning optimal tree-based prescription policies directly from data.
We combine methods for counterfactual estimation from the causal inference literature with recent advances in training globally-optimal decision trees.
The resulting method, Optimal Policy Trees, yields interpretable prescription policies.
arXiv Detail & Related papers (2020-12-03T21:41:22Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Descending through a Crowded Valley - Benchmarking Deep Learning
Optimizers [29.624308090226375]
In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed anecdotes.
To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learnings.
Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods.
arXiv Detail & Related papers (2020-07-03T08:19:36Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.