Deep differentiable forest with sparse attention for the tabular data
- URL: http://arxiv.org/abs/2003.00223v1
- Date: Sat, 29 Feb 2020 09:47:13 GMT
- Title: Deep differentiable forest with sparse attention for the tabular data
- Authors: Yingshi Chen
- Abstract summary: The differentiable forest has the advantages of both trees and neural networks.
It has full differentiability and all variables are learnable parameters.
We find and analyze the attention mechanism in the differentiable forest.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general architecture of deep differentiable forest and its
sparse attention mechanism. The differentiable forest has the advantages of
both trees and neural networks. Its structure is a simple binary tree, easy to
use and understand. It has full differentiability and all variables are
learnable parameters. We would train it by the gradient-based optimization
method, which shows great power in the training of deep CNN. We find and
analyze the attention mechanism in the differentiable forest. That is, each
decision depends on only a few important features, and others are irrelevant.
The attention is always sparse. Based on this observation, we improve its
sparsity by data-aware initialization. We use the attribute importance to
initialize the attention weight. Then the learned weight is much sparse than
that from random initialization. Our experiment on some large tabular dataset
shows differentiable forest has higher accuracy than GBDT, which is the state
of art algorithm for tabular datasets. The source codes are available at
https://github.com/closest-git/QuantumForest
Related papers
- TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees [73.0940890296463]
probabilistic values are used to rank features for explaining local predicted values of decision trees.<n>TreeGrad computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves.<n>TreeGrad-Ranker aggregates the gradients while optimizing the joint objective to produce feature rankings.<n>TreeGrad-Shap is a numerically stable algorithm for computing Beta Shapley values with integral parameters.
arXiv Detail & Related papers (2026-02-12T06:17:12Z) - Scaling Up Forest Vision with Synthetic Data [0.0]
Accurate tree segmentation is a key step in extracting individual tree metrics from forest laser scans.<n>In place of expensive field data collection and annotation, we use synthetic data during pretraining.<n>We have produced a comprehensive, diverse, annotated 3D forest dataset on an unprecedented scale.
arXiv Detail & Related papers (2025-09-14T10:00:59Z) - TreeLearn: A Comprehensive Deep Learning Method for Segmenting
Individual Trees from Ground-Based LiDAR Forest Point Clouds [42.87502453001109]
We propose TreeLearn, a deep learning-based approach for tree instance segmentation of forest point clouds.
TreeLearn is trained on already segmented point clouds in a data-driven manner, making it less reliant on predefined features and algorithms.
We trained TreeLearn on forest point clouds of 6665 trees, labeled using the Lidar360 software.
arXiv Detail & Related papers (2023-09-15T15:20:16Z) - Bayesian post-hoc regularization of random forests [0.0]
Random Forests are powerful ensemble learning algorithms widely used in various machine learning tasks.
We propose post-hoc regularization to leverage the reliable patterns captured by leaf nodes closer to the root.
We have evaluated the performance of our method on various machine learning data sets.
arXiv Detail & Related papers (2023-06-06T14:15:29Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Shrub Ensembles for Online Classification [7.057937612386993]
Decision Tree (DT) ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient.
We propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems.
Our algorithm trains small to medium-sized decision trees on small windows and uses gradient descent to learn the ensemble weights of these shrubs'
arXiv Detail & Related papers (2021-12-07T14:22:43Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - To Boost or not to Boost: On the Limits of Boosted Neural Networks [67.67776094785363]
Boosting is a method for learning an ensemble of classifiers.
While boosting has been shown to be very effective for decision trees, its impact on neural networks has not been extensively studied.
We find that a single neural network usually generalizes better than a boosted ensemble of smaller neural networks with the same total number of parameters.
arXiv Detail & Related papers (2021-07-28T19:10:03Z) - Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data
via Differentiable Cross-Approximation [53.95297550117153]
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking emphat a fraction of their entries only.
The proposed approach is particularly useful for large-scale multidimensional grid data, and for tasks that require context over a large receptive field.
arXiv Detail & Related papers (2021-05-29T08:39:57Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Attention augmented differentiable forest for tabular data [0.0]
Differentiable forest is ensemble of decision trees with full differentiability.
We propose tree attention block(TAB) in the framework of differentiable forest.
arXiv Detail & Related papers (2020-10-02T11:42:33Z) - Is deeper better? It depends on locality of relevant features [5.33024001730262]
We investigate the effect of increasing the depth within an over parameterized regime.
Experiments show that deeper is better for local labels, whereas shallower is better for global labels.
It is shown that the neural kernel does not correctly capture the depth dependence of the generalization performance.
arXiv Detail & Related papers (2020-05-26T02:44:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.