Related papers: LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models

LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models

URL: http://arxiv.org/abs/2210.05168v1
Date: Tue, 11 Oct 2022 06:14:12 GMT
Title: LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models
Authors: Andrei V. Konstantinov and Lev V. Utkin
Abstract summary: New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters.
Score: 5.482532589225552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: New models of the attention-based random forests called LARF (Leaf Attention-based Random Forest) are proposed. The first idea behind the models is to introduce a two-level attention, where one of the levels is the "leaf" attention and the attention mechanism is applied to every leaf of trees. The second level is the tree attention depending on the "leaf" attention. The second idea is to replace the softmax operation in the attention with the weighted sum of the softmax operations with different parameters. It is implemented by applying a mixture of the Huber's contamination models and can be regarded as an analog of the multi-head attention with "heads" defined by selecting a value of the softmax parameter. Attention parameters are simply trained by solving the quadratic optimization problem. To simplify the tuning process of the models, it is proposed to make the tuning contamination parameters to be training and to compute them by solving the quadratic optimization problem. Many numerical experiments with real datasets are performed for studying LARFs. The code of proposed algorithms can be found in https://github.com/andruekonst/leaf-attention-forest.

Related papers

Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation. DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level. To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z)
Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining [61.09807522366773]
We introduce an algorithm that approximates the softmax with provable bounds and that dynamically maintains the tree. In our study on datasets with over twenty million targets, our approach cuts error by half in relation to oracle brute-force negative mining.
arXiv Detail & Related papers (2023-03-27T15:18:32Z)
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost [53.746169882193456]
Recent works have proposed various sparse attention modules to overcome the quadratic cost of self-attention. We propose a model that resolves both problems by endowing each attention head with a mixed-membership Block Model. Our model outperforms previous efficient variants as well as the original Transformer with full attention.
arXiv Detail & Related papers (2022-10-27T15:30:52Z)
Improved Anomaly Detection by Using the Attention-Based Isolation Forest [4.640835690336653]
Attention-Based Isolation Forest (ABIForest) for solving anomaly detection problem is proposed. The main idea is to assign attention weights to each path of trees with learnable parameters depending on instances and trees themselves. ABIForest can be viewed as the first modification of Isolation Forest, which incorporates the attention mechanism in a simple way without applying gradient-based algorithms.
arXiv Detail & Related papers (2022-10-05T20:58:57Z)
Attention and Self-Attention in Random Forests [5.482532589225552]
New models of random forests jointly using the attention and self-attention mechanisms are proposed. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest.
arXiv Detail & Related papers (2022-07-09T16:15:53Z)
An Approximation Method for Fitted Random Forests [0.0]
We study methods that approximate each fitted tree in the Random Forests model using the multinomial allocation of the data points to the leafs. Specifically, we begin by studying whether fitting a multinomial logistic regression helps reduce the size while preserving the prediction quality.
arXiv Detail & Related papers (2022-07-05T17:28:52Z)
Attention-based Random Forest and Contamination Model [5.482532589225552]
The main idea behind the proposed ABRF models is to assign attention weights with trainable parameters to decision trees in a specific way. The weights depend on the distance between an instance, which falls into a corresponding leaf of a tree, and instances, which fall in the same leaf.
arXiv Detail & Related papers (2022-01-08T19:35:57Z)
Predicting Attention Sparsity in Transformers [0.9786690381850356]
We propose Sparsefinder, a model trained to identify the sparsity pattern of entmax attention before computing it. Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph.
arXiv Detail & Related papers (2021-09-24T20:51:21Z)
Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection [54.98042023365694]
We propose a noise-aware encoder-decoder framework to disentangle a clean saliency predictor from noisy training examples. The proposed model consists of two sub-models parameterized by neural networks.
arXiv Detail & Related papers (2020-07-23T18:47:36Z)
Set Based Stochastic Subsampling [85.5331107565578]
We propose a set-based two-stage end-to-end neural subsampling model that is jointly optimized with an textitarbitrary downstream task network. We show that it outperforms the relevant baselines under low subsampling rates on a variety of tasks including image classification, image reconstruction, function reconstruction and few-shot classification.
arXiv Detail & Related papers (2020-06-25T07:36:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.