Hopular: Modern Hopfield Networks for Tabular Data
- URL: http://arxiv.org/abs/2206.00664v1
- Date: Wed, 1 Jun 2022 17:57:44 GMT
- Title: Hopular: Modern Hopfield Networks for Tabular Data
- Authors: Bernhard Sch\"afl, Lukas Gruber, Angela Bitto-Nemling, Sepp Hochreiter
- Abstract summary: We suggest "Hopular", a novel Deep Learning architecture for medium- and small-sized datasets.
Hopular uses stored data to identify feature-feature, feature-target, and sample-sample dependencies.
In experiments on small-sized datasets with less than 1,000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods.
- Score: 5.470026407471584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Deep Learning excels in structured data as encountered in vision and
natural language processing, it failed to meet its expectations on tabular
data. For tabular data, Support Vector Machines (SVMs), Random Forests, and
Gradient Boosting are the best performing techniques with Gradient Boosting in
the lead. Recently, we saw a surge of Deep Learning methods that were tailored
to tabular data but still underperform compared to Gradient Boosting on
small-sized datasets. We suggest "Hopular", a novel Deep Learning architecture
for medium- and small-sized datasets, where each layer is equipped with
continuous modern Hopfield networks. The modern Hopfield networks use stored
data to identify feature-feature, feature-target, and sample-sample
dependencies. Hopular's novelty is that every layer can directly access the
original input as well as the whole training set via stored data in the
Hopfield networks. Therefore, Hopular can step-wise update its current model
and the resulting prediction at every layer like standard iterative learning
algorithms. In experiments on small-sized tabular datasets with less than 1,000
samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in
particular several Deep Learning methods. In experiments on medium-sized
tabular data with about 10,000 samples, Hopular outperforms XGBoost, CatBoost,
LightGBM and a state-of-the art Deep Learning method designed for tabular data.
Thus, Hopular is a strong alternative to these methods on tabular data.
Related papers
- Mambular: A Sequential Model for Tabular Deep Learning [0.7184556517162347]
We introduce Mambular, an adaptation of the Mamba architecture optimized for tabular data.
We benchmark Mambular against state-of-the-art models, including neural networks and tree-based methods.
Our analysis shows that interpreting features as a sequence and passing them through Mamba layers results in surprisingly performant models.
arXiv Detail & Related papers (2024-08-12T16:57:57Z) - Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment [126.34547428473968]
Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios.
We propose a low-redundant alignment method named textbfALLO, focusing on optimizing the most related neurons with the most useful supervised signals.
Experimental results on 10 datasets have shown the effectiveness of ALLO.
arXiv Detail & Related papers (2024-06-18T13:34:40Z) - BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model [6.888608574535993]
BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in data.
BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise.
We show BiSHop surpasses current SOTA methods with significantly less HPO runs.
arXiv Detail & Related papers (2024-04-04T23:13:32Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Generating and Imputing Tabular Data via Diffusion and Flow-based
Gradient-Boosted Trees [11.732842929815401]
Tabular data is hard to acquire and is subject to missing values.
This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) data.
In contrast to prior methods that rely on neural networks to learn the score function or the vector field, we adopt XGBoost.
arXiv Detail & Related papers (2023-09-18T17:49:09Z) - HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular
Datasets [3.9870413777302027]
We introduce HyperTab, a hypernetwork-based approach to solving small sample problems on datasets.
By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks.
We show that HyperTab consistently outranks other methods on small data and scores comparable to them on larger datasets.
arXiv Detail & Related papers (2023-04-07T08:48:07Z) - Is margin all you need? An extensive empirical study of active learning
on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark.
Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z) - Why do tree-based models still outperform deep learning on tabular data? [0.0]
We show that tree-based models remain state-of-the-art on medium-sized data.
We conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs)
arXiv Detail & Related papers (2022-07-18T08:36:08Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.