Related papers: Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting

Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting

URL: http://arxiv.org/abs/2602.05371v1
Date: Thu, 05 Feb 2026 06:49:01 GMT
Title: Hinge Regression Tree: A Newton Method for Oblique Regression Tree Splitting
Authors: Hongyi Li, Han Lin, Jun Xu,
Abstract summary: We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors.<n>We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges.<n>We show on synthetic and real-world benchmarks that HRT matches or outperforms single-tree baselines with more compact structures.
Score: 18.562483381753804
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(δ^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.

Related papers

Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback [50.89125374999765]
We provide the first convergence guarantee for Optimistic Multiplicative Weights Update ($mathtOMWU$) in NLHF.<n>Our analysis identifies a novel marginal convergence behavior, where the probability of rarely played actions grows exponentially from exponentially small values.
arXiv Detail & Related papers (2025-12-31T12:08:29Z)
RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees [2.612627266839037]
Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees.<n>Naively binarizing continuous features sacrifices global optimality and often yields needlessly deep trees.<n>We recast the optimal regression-tree training as a two-stage optimization problem and propose Reduced-Space Optimal Regression Trees (RS-ORT)<n> RS-ORT is a specialized branch-and-bound (BB) algorithm that branches exclusively on tree-structural variables.
arXiv Detail & Related papers (2025-10-27T22:17:09Z)
Soft regression trees: a model variant and a decomposition training algorithm [0.24578723416255752]
We propose a new variant of soft multivariate regression trees (SRTs) where, for every input vector, the prediction is defined as a linear regression associated to a single leaf node.<n>SRTs exhibit the conditional computational property, i.e., each prediction depends on a small number of nodes.<n> Experiments on 15 wellknown datasets indicate that our SRTs and decomposition algorithm yield higher accuracy and robustness compared with traditional soft regression trees.
arXiv Detail & Related papers (2025-01-10T13:06:36Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions. We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z)
A cautionary tale on fitting decision trees to data from additive models: generalization lower bounds [9.546094657606178]
We study the generalization performance of decision trees with respect to different generative regression models. This allows us to elicit their inductive bias, that is, the assumptions the algorithms make (or do not make) to generalize to new data. We prove a sharp squared error generalization lower bound for a large class of decision tree algorithms fitted to sparse additive models.
arXiv Detail & Related papers (2021-10-18T21:22:40Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z)
Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z)
Stochastic tree ensembles for regularized nonlinear regression [0.913755431537592]
This paper develops a novel tree ensemble method for nonlinear regression, which we refer to as XBART. By combining regularization and search strategies from Bayesian modeling with computationally efficient techniques, the new method attains state-of-the-art performance.
arXiv Detail & Related papers (2020-02-09T14:37:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.