Boosting gets full Attention for Relational Learning
- URL: http://arxiv.org/abs/2402.14926v1
- Date: Thu, 22 Feb 2024 19:16:01 GMT
- Title: Boosting gets full Attention for Relational Learning
- Authors: Mathieu Guillame-Bert and Richard Nock
- Abstract summary: We introduce an attention mechanism for structured data that blends well with tree-based models in the training context of (gradient) boosting.
Experiments on simulated and real-world domains display the competitiveness of our method against a state of the art containing both tree-based and neural nets-based models.
- Score: 27.82663283409287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: More often than not in benchmark supervised ML, tabular data is flat, i.e.
consists of a single $m \times d$ (rows, columns) file, but cases abound in the
real world where observations are described by a set of tables with structural
relationships. Neural nets-based deep models are a classical fit to incorporate
general topological dependence among description features (pixels, words,
etc.), but their suboptimality to tree-based models on tabular data is still
well documented. In this paper, we introduce an attention mechanism for
structured data that blends well with tree-based models in the training context
of (gradient) boosting. Each aggregated model is a tree whose training involves
two steps: first, simple tabular models are learned descending tables in a
top-down fashion with boosting's class residuals on tables' features. Second,
what has been learned progresses back bottom-up via attention and aggregation
mechanisms, progressively crafting new features that complete at the end the
set of observation features over which a single tree is learned, boosting's
iteration clock is incremented and new class residuals are computed.
Experiments on simulated and real-world domains display the competitiveness of
our method against a state of the art containing both tree-based and neural
nets-based models.
Related papers
- Escaping the Forest: Sparse Interpretable Neural Networks for Tabular Data [0.0]
We show that our models, Sparse TABular NET or sTAB-Net with attention mechanisms, are more effective than tree-based models.
They achieve better performance than post-hoc methods like SHAP.
arXiv Detail & Related papers (2024-10-23T10:50:07Z) - TRESTLE: A Model of Concept Formation in Structured Domains [4.399333421690168]
We present TRESTLE, an incremental account of probabilistic concept formation in structured domains.
We evaluate TRESTLE's performance on a supervised learning task and an unsupervised clustering task.
arXiv Detail & Related papers (2024-10-14T15:00:43Z) - Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later [59.88557193062348]
We revisit the classic Neighborhood Component Analysis (NCA), designed to learn a linear projection that captures semantic similarities between instances.
We find that minor modifications, such as adjustments to the learning objectives and the integration of deep learning architectures, significantly enhance NCA's performance.
We also introduce a neighbor sampling strategy that improves both the efficiency and predictive accuracy of our proposed ModernNCA.
arXiv Detail & Related papers (2024-07-03T16:38:57Z) - Hierarchical clustering with dot products recovers hidden tree structure [53.68551192799585]
In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure.
We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance.
We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model.
arXiv Detail & Related papers (2023-05-24T11:05:12Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Why do tree-based models still outperform deep learning on tabular data? [0.0]
We show that tree-based models remain state-of-the-art on medium-sized data.
We conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs)
arXiv Detail & Related papers (2022-07-18T08:36:08Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Improving Label Quality by Jointly Modeling Items and Annotators [68.8204255655161]
We propose a fully Bayesian framework for learning ground truth labels from noisy annotators.
Our framework ensures scalability by factoring a generative, Bayesian soft clustering model over label distributions into the classic David and Skene joint annotator-data model.
arXiv Detail & Related papers (2021-06-20T02:15:20Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Tensor Decompositions in Recursive Neural Networks for Tree-Structured
Data [12.069862650316262]
We introduce two new aggregation functions to encode structural knowledge from tree-structured data.
We test them on two tree classification tasks, showing the advantage of proposed models when tree outdegree increases.
arXiv Detail & Related papers (2020-06-18T15:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.