A GPU-Accelerated Moving-Horizon Algorithm for Training Deep
Classification Trees on Large Datasets
- URL: http://arxiv.org/abs/2311.06952v1
- Date: Sun, 12 Nov 2023 20:34:00 GMT
- Title: A GPU-Accelerated Moving-Horizon Algorithm for Training Deep
Classification Trees on Large Datasets
- Authors: Jiayang Ren, Valent\'in Osuna-Enciso, Morimasa Okamoto, Qiangqiang
Mao, Chaojie Ji, Liang Cao, Kaixun Hua, Yankai Cao
- Abstract summary: We introduce a moving-horizon differential algorithm for classification trees with continuous features (MH-DEOCT)
Our approach consists of a discrete tree decoding method that eliminates duplicated searches between adjacent samples.
MH-DEOCT achieves near-optimal performance on training and testing.
- Score: 13.02279265869312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision trees are essential yet NP-complete to train, prompting the
widespread use of heuristic methods such as CART, which suffers from
sub-optimal performance due to its greedy nature. Recently, breakthroughs in
finding optimal decision trees have emerged; however, these methods still face
significant computational costs and struggle with continuous features in
large-scale datasets and deep trees. To address these limitations, we introduce
a moving-horizon differential evolution algorithm for classification trees with
continuous features (MH-DEOCT). Our approach consists of a discrete tree
decoding method that eliminates duplicated searches between adjacent samples, a
GPU-accelerated implementation that significantly reduces running time, and a
moving-horizon strategy that iteratively trains shallow subtrees at each node
to balance the vision and optimizer capability. Comprehensive studies on 68 UCI
datasets demonstrate that our approach outperforms the heuristic method CART on
training and testing accuracy by an average of 3.44% and 1.71%, respectively.
Moreover, these numerical studies empirically demonstrate that MH-DEOCT
achieves near-optimal performance (only 0.38% and 0.06% worse than the global
optimal method on training and testing, respectively), while it offers
remarkable scalability for deep trees (e.g., depth=8) and large-scale datasets
(e.g., ten million samples).
Related papers
- Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning [0.24578723416255746]
Histogram-based Differential Evolution (HBDE) hybridizes gradient-based and gradient-free algorithms to optimize parameters.
HBDE outperforms baseline gradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and CIFAR-100 datasets.
arXiv Detail & Related papers (2024-08-13T20:28:20Z) - Rolling Lookahead Learning for Optimal Classification Trees [0.0]
We propose a rolling subtree lookahead algorithm that combines the relative scalability of the myopic approaches with the foresight of the optimal approaches in constructing trees.
We show that our approach outperforms optimal and myopic approaches in 808 out of 1330 problem instances, improving the out-of-sample accuracy by up to 23.6% and 14.4%, respectively.
arXiv Detail & Related papers (2023-04-21T09:17:00Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - A bi-level encoding scheme for the clustered shortest-path tree problem
in multifactorial optimization [1.471992435706872]
The Clustered Shortest-Path Tree Problem (CluSPT) plays an important role in various types of optimization problems in real-life.
Recently, some Multifactorial Evolutionary Algorithm (MFEA) have been introduced to deal with the CluSPT.
This paper describes a MFEA-based approach to solve the CluSPT.
arXiv Detail & Related papers (2021-02-12T13:36:07Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Towards Efficient and Scalable Acceleration of Online Decision Tree
Learning on FPGA [20.487660974785943]
In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets.
We introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models.
We present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array.
arXiv Detail & Related papers (2020-09-03T03:23:43Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.