Simplification of Forest Classifiers and Regressors
- URL: http://arxiv.org/abs/2212.07103v1
- Date: Wed, 14 Dec 2022 08:49:02 GMT
- Title: Simplification of Forest Classifiers and Regressors
- Authors: Atsuyoshi Nakamura, Kento Sakurada
- Abstract summary: We study the problem of sharing as many branching conditions of a given forest classifier or regressor as possible.
We propose an algorithm for the original problem using an algorithm solving this problem efficiently.
The effectiveness of our method is demonstrated through comprehensive experiments using 21 datasets.
- Score: 1.8275108630751844
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of sharing as many branching conditions of a given
forest classifier or regressor as possible while keeping classification
performance. As a constraint for preventing from accuracy degradation, we first
consider the one that the decision paths of all the given feature vectors must
not change. For a branching condition that a value of a certain feature is at
most a given threshold, the set of values satisfying such constraint can be
represented as an interval. Thus, the problem is reduced to the problem of
finding the minimum set intersecting all the constraint-satisfying intervals
for each set of branching conditions on the same feature. We propose an
algorithm for the original problem using an algorithm solving this problem
efficiently. The constraint is relaxed later to promote further sharing of
branching conditions by allowing decision path change of a certain ratio of the
given feature vectors or allowing a certain number of non-intersected
constraint-satisfying intervals. We also extended our algorithm for both the
relaxations. The effectiveness of our method is demonstrated through
comprehensive experiments using 21 datasets (13 classification and 8 regression
datasets in UCI machine learning repository) and 4 classifiers/regressors
(random forest, extremely randomized trees, AdaBoost and gradient boosting).
Related papers
- Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Global and Preference-based Optimization with Mixed Variables using Piecewise Affine Surrogates [0.6083861980670925]
This paper proposes a novel surrogate-based global optimization algorithm to solve linearly constrained mixed-variable problems.
We assume the objective function is black-box and expensive-to-evaluate, while the linear constraints are quantifiable unrelaxable a priori known.
We introduce two types of exploration functions to efficiently search the feasible domain via mixed-integer linear programming solvers.
arXiv Detail & Related papers (2023-02-09T15:04:35Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - On Dynamic Regret and Constraint Violations in Constrained Online Convex
Optimization [17.412117389855222]
We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action.
We show that both the dynamic regret and the constraint violation is order-wise bounded by the it path-length, the sum of the distances between the consecutive optimal actions.
arXiv Detail & Related papers (2023-01-24T04:22:13Z) - Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback.
It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines.
We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z) - Iterative Weak Learnability and Multi-Class AdaBoost [0.0]
We construct an efficient ensemble algorithm for the multi-class classification problem inspired by SAMME.
In contrast to SAMME, our algorithm's final hypothesis converges to the correct label with probability 1.
The sum of the training error and an additional term, that depends only on the sample size, bounds the generalization error of our algorithm as the Adaptive Boosting algorithm.
arXiv Detail & Related papers (2021-01-26T03:30:30Z) - Modeling Text with Decision Forests using Categorical-Set Splits [2.434796198711328]
In axis-aligned decision forests, the "decision" to route an input example is the result of the evaluation of a condition on a single dimension in the feature space.
We define a condition that is specific to categorical-set features and present an algorithm to learn it.
Our algorithm is efficient during training and the resulting conditions are fast to evaluate with our extension of the QuickScorer inference algorithm.
arXiv Detail & Related papers (2020-09-21T16:16:35Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z) - Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take.
We propose a novel approach based on Lagrangian Decomposition.
We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.