RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees
- URL: http://arxiv.org/abs/2510.23901v1
- Date: Mon, 27 Oct 2025 22:17:09 GMT
- Title: RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees
- Authors: Cristobal Heredia, Pedro Chumpitaz-Flores, Kaixun Hua,
- Abstract summary: Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees.<n>Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees.<n>We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT)<n> RS-ORT is a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables.
- Score: 2.612627266839037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary features or become computationally intractable when continuous, large-scale data are involved. Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees. We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT) - a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables. This design guarantees the algorithm's convergence and its independence from the number of training samples. Leveraging the model's structure, we introduce several bound tightening techniques - closed-form leaf prediction, empirical threshold discretization, and exact depth-1 subtree parsing - that combine with decomposable upper and lower bounding strategies to accelerate the training. The BB node-wise decomposition enables trivial parallel execution, further alleviating the computational intractability even for million-size datasets. Based on the empirical studies on several regression benchmarks containing both binary and continuous features, RS-ORT also delivers superior training and testing performance than state-of-the-art methods. Notably, on datasets with up to 2,000,000 samples with continuous features, RS-ORT can obtain guaranteed training performance with a simpler tree structure and a better generalization ability in four hours.
Related papers
- Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting [18.562483381753804]
We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors.<n>We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges.<n>We show on synthetic and real-world benchmarks that HRT matches or outperforms single-tree baselines with more compact structures.
arXiv Detail & Related papers (2026-02-05T06:49:01Z) - Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse [21.642997639835396]
We propose Tree Training, a paradigm that computes each shared prefix only once and reuses its intermediate results across related branches during both forward and backward passes.<n>Experiments on multiple open-source models demonstrate up to 3.9x reduction in total training time, enabling more efficient agentic LLM SFT and RL training.
arXiv Detail & Related papers (2025-11-01T05:56:49Z) - TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling [65.46347858249295]
TreePO is a self-guided rollout algorithm that views sequence generation as a tree-structured searching process.<n>TreePO essentially reduces the per-update compute burden while preserving or enhancing exploration diversity.
arXiv Detail & Related papers (2025-08-24T16:52:37Z) - Progressive Binarization with Semi-Structured Pruning for LLMs [36.91249209658632]
We propose Progressive Binarization with Semi-Structured Pruning (PBS$2$P), a novel post-training framework that seamlessly integrates binarization and semi-structured pruning.<n>We show that PBS$2$P consistently outperforms state-of-the-art (SOTA) binary post-training quantization methods in both perplexity and downstream accuracy.
arXiv Detail & Related papers (2025-02-03T13:30:29Z) - Soft regression trees: a model variant and a decomposition training algorithm [0.24578723416255752]
We propose a new variant of soft multivariate regression trees (SRTs) where, for every input vector, the prediction is defined as a linear regression associated to a single leaf node.<n>SRTs exhibit the conditional computational property, i.e., each prediction depends on a small number of nodes.<n> Experiments on 15 wellknown datasets indicate that our SRTs and decomposition algorithm yield higher accuracy and robustness compared with traditional soft regression trees.
arXiv Detail & Related papers (2025-01-10T13:06:36Z) - Des-q: a quantum algorithm to provably speedup retraining of decision trees [2.7262923206583136]
We introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks.<n>We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets.<n>Our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
arXiv Detail & Related papers (2023-09-18T17:56:08Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Improved Branch and Bound for Neural Network Verification via Lagrangian
Decomposition [161.09660864941603]
We improve the scalability of Branch and Bound (BaB) algorithms for formally proving input-output properties of neural networks.
We present a novel activation-based branching strategy and a BaB framework, named Branch and Dual Network Bound (BaDNB)
BaDNB outperforms previous complete verification systems by a large margin, cutting average verification times by factors up to 50 on adversarial properties.
arXiv Detail & Related papers (2021-04-14T09:22:42Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.