When Do Neural Nets Outperform Boosted Trees on Tabular Data?
- URL: http://arxiv.org/abs/2305.02997v4
- Date: Mon, 15 Jul 2024 19:00:47 GMT
- Title: When Do Neural Nets Outperform Boosted Trees on Tabular Data?
- Authors: Duncan McElfresh, Sujay Khandagale, Jonathan Valverde, Vishak Prasad C, Benjamin Feuer, Chinmay Hegde, Ganesh Ramakrishnan, Micah Goldblum, Colin White,
- Abstract summary: We take a step back and question the importance of the 'NN vs. GBDT' debate.
For a surprisingly high number of datasets, the performance difference between GBDTs and NNs is negligible.
We analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well.
Our insights act as a guide for practitioners to determine which techniques may work best on their dataset.
- Score: 65.30290020731825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. To this end, we conduct the largest tabular data analysis to date, comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than choosing between NNs and GBDTs. A remarkable exception is the recently-proposed prior-data fitted network, TabPFN: although it is effectively limited to training sets of size 3000, we find that it outperforms all other algorithms on average, even when randomly sampling 3000 training datapoints. Next, we analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed or heavy-tailed feature distributions and other forms of dataset irregularities. Our insights act as a guide for practitioners to determine which techniques may work best on their dataset. Finally, with the goal of accelerating tabular data research, we release the TabZilla Benchmark Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark suite, codebase, and all raw results are available at https://github.com/naszilla/tabzilla.
Related papers
- Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs [20.67800392863432]
Tabular datasets play a crucial role in various applications.
Two prominent model types, Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct prediction tasks.
This paper proposes a new framework that amalgamates the advantages of both GBDTs and DNNs, resulting in a DNN algorithm that is as efficient as GBDTs and is competitively effective regardless of dataset preferences.
arXiv Detail & Related papers (2024-07-13T07:13:32Z) - TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks [90.00817095558094]
Prior-data fitted networks (PFNs) make use of pretraining and in-context learning to achieve strong performance on new tasks in a single forward pass.
We introduce TuneTables, a parameter-efficient fine-tuning strategy for PFNs that compresses large datasets into a smaller learned context.
We show that TuneTables can be used as an interpretability tool and can even be used to mitigate biases by optimizing a fairness objective.
arXiv Detail & Related papers (2024-02-17T00:02:23Z) - REST: Enhancing Group Robustness in DNNs through Reweighted Sparse
Training [49.581884130880944]
Deep neural network (DNN) has been proven effective in various domains.
However, they often struggle to perform well on certain minority groups during inference.
arXiv Detail & Related papers (2023-12-05T16:27:54Z) - TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023 [33.70333110327871]
We present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle.
On a set of public benchmarks with datasets up to several million objects, TabR demonstrates the best average performance.
In addition to the much higher performance, TabR is simple and significantly more efficient.
arXiv Detail & Related papers (2023-07-26T17:58:07Z) - HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular
Datasets [3.9870413777302027]
We introduce HyperTab, a hypernetwork-based approach to solving small sample problems on datasets.
By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks.
We show that HyperTab consistently outranks other methods on small data and scores comparable to them on larger datasets.
arXiv Detail & Related papers (2023-04-07T08:48:07Z) - Why do tree-based models still outperform deep learning on tabular data? [0.0]
We show that tree-based models remain state-of-the-art on medium-sized data.
We conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs)
arXiv Detail & Related papers (2022-07-18T08:36:08Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - A Robust Stacking Framework for Training Deep Graph Models with
Multifaceted Node Features [61.92791503017341]
Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various supervised learning tasks with graph data.
The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not easily incorporated into a GNN.
Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data.
arXiv Detail & Related papers (2022-06-16T22:46:33Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.