LARF: Two-level Attention-based Random Forests with a Mixture of
Contamination Models
- URL: http://arxiv.org/abs/2210.05168v1
- Date: Tue, 11 Oct 2022 06:14:12 GMT
- Title: LARF: Two-level Attention-based Random Forests with a Mixture of
Contamination Models
- Authors: Andrei V. Konstantinov and Lev V. Utkin
- Abstract summary: New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed.
The first idea is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees.
The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters.
- Score: 5.482532589225552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New models of the attention-based random forests called LARF (Leaf
Attention-based Random Forest) are proposed. The first idea behind the models
is to introduce a two-level attention, where one of the levels is the "leaf"
attention and the attention mechanism is applied to every leaf of trees. The
second level is the tree attention depending on the "leaf" attention. The
second idea is to replace the softmax operation in the attention with the
weighted sum of the softmax operations with different parameters. It is
implemented by applying a mixture of the Huber's contamination models and can
be regarded as an analog of the multi-head attention with "heads" defined by
selecting a value of the softmax parameter. Attention parameters are simply
trained by solving the quadratic optimization problem. To simplify the tuning
process of the models, it is proposed to make the tuning contamination
parameters to be training and to compute them by solving the quadratic
optimization problem. Many numerical experiments with real datasets are
performed for studying LARFs. The code of proposed algorithms can be found in
https://github.com/andruekonst/leaf-attention-forest.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Improving Dual-Encoder Training through Dynamic Indexes for Negative
Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree.
In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z) - Transformers meet Stochastic Block Models: Attention with Data-Adaptive
Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention.
We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model.
Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z) - Improved Anomaly Detection by Using the Attention-Based Isolation Forest [4.640835690336653]
Attention-Based Isolation Forest (ABIForest) for solving anomaly detection problem is proposed.
The main idea is to assign attention weights to each path of trees with learnable parameters depending on instances and trees themselves.
ABIForest can be viewed as the first modification of Isolation Forest, which incorporates the attention mechanism in a simple way without applying gradient-based algorithms.
arXiv Detail & Related papers (2022-10-05T20:58:57Z) - Attention and Self-Attention in Random Forests [5.482532589225552]
New models of random forests jointly using the attention and self-attention mechanisms are proposed.
The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest.
arXiv Detail & Related papers (2022-07-09T16:15:53Z) - An Approximation Method for Fitted Random Forests [0.0]
We study methods that approximate each fitted tree in the Random Forests model using the multinomial allocation of the data points to the leafs.
Specifically, we begin by studying whether fitting a multinomial logistic regression helps reduce the size while preserving the prediction quality.
arXiv Detail & Related papers (2022-07-05T17:28:52Z) - Attention-based Random Forest and Contamination Model [5.482532589225552]
The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way.
The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf.
arXiv Detail & Related papers (2022-01-08T19:35:57Z) - Predicting Attention Sparsity in Transformers [0.9786690381850356]
We propose Sparsefinder, a model trained to identify the sparsity pattern of entmax attention before computing it.
Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph.
arXiv Detail & Related papers (2021-09-24T20:51:21Z) - Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating
Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples.
The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z) - Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network.
We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.