Related papers: Hopular: Modern Hopfield Networks for Tabular Data

Hopular: Modern Hopfield Networks for Tabular Data

URL: http://arxiv.org/abs/2206.00664v1
Date: Wed, 1 Jun 2022 17:57:44 GMT
Title: Hopular: Modern Hopfield Networks for Tabular Data
Authors: Bernhard Sch\"afl, Lukas Gruber, Angela Bitto-Nemling, Sepp Hochreiter
Abstract summary: We suggest "Hopular", a novel Deep Learning architecture for medium- and small-sized datasets. Hopular uses stored data to identify feature-feature, feature-target, and sample-sample dependencies. In experiments on small-sized datasets with less than 1,000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods.
Score: 5.470026407471584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Deep Learning excels in structured data as encountered in vision and natural language processing, it failed to meet its expectations on tabular data. For tabular data, Support Vector Machines (SVMs), Random Forests, and Gradient Boosting are the best performing techniques with Gradient Boosting in the lead. Recently, we saw a surge of Deep Learning methods that were tailored to tabular data but still underperform compared to Gradient Boosting on small-sized datasets. We suggest "Hopular", a novel Deep Learning architecture for medium- and small-sized datasets, where each layer is equipped with continuous modern Hopfield networks. The modern Hopfield networks use stored data to identify feature-feature, feature-target, and sample-sample dependencies. Hopular's novelty is that every layer can directly access the original input as well as the whole training set via stored data in the Hopfield networks. Therefore, Hopular can step-wise update its current model and the resulting prediction at every layer like standard iterative learning algorithms. In experiments on small-sized tabular datasets with less than 1,000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods. In experiments on medium-sized tabular data with about 10,000 samples, Hopular outperforms XGBoost, CatBoost, LightGBM and a state-of-the art Deep Learning method designed for tabular data. Thus, Hopular is a strong alternative to these methods on tabular data.

Related papers

Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes [135.68092471784516]
We propose a simple and lightweight approach for fusing large language models and gradient-boosted decision trees. We name our fusion methods LLM-Boost and PFN-Boost, respectively. We demonstrate state-of-the-art performance against numerous baselines and ensembling algorithms.
arXiv Detail & Related papers (2025-02-04T19:30:41Z)
Mambular: A Sequential Model for Tabular Deep Learning [0.7184556517162347]
We introduce Mambular, an adaptation of the Mamba architecture optimized for tabular data. We benchmark Mambular against state-of-the-art models, including neural networks and tree-based methods. Our analysis shows that interpreting features as a sequence and passing them through Mamba layers results in surprisingly performant models.
arXiv Detail & Related papers (2024-08-12T16:57:57Z)
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data. We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Not Everything is All You Need: Toward Low-Redundant Optimization for Large Language Model Alignment [126.34547428473968]
Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. We propose a low-redundant alignment method named textbfALLO, focusing on optimizing the most related neurons with the most useful supervised signals. Experimental results on 10 datasets have shown the effectiveness of ALLO.
arXiv Detail & Related papers (2024-06-18T13:34:40Z)
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model [6.888608574535993]
BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in data. BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise. We show BiSHop surpasses current SOTA methods with significantly less HPO runs.
arXiv Detail & Related papers (2024-04-04T23:13:32Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees [11.732842929815401]
Tabular data is hard to acquire and is subject to missing values. This paper introduces a novel approach for generating and imputing mixed-type (continuous and categorical) data. In contrast to prior methods that rely on neural networks to learn the score function or the vector field, we adopt XGBoost.
arXiv Detail & Related papers (2023-09-18T17:49:09Z)
HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets [3.9870413777302027]
We introduce HyperTab, a hypernetwork-based approach to solving small sample problems on datasets. By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks. We show that HyperTab consistently outranks other methods on small data and scores comparable to them on larger datasets.
arXiv Detail & Related papers (2023-04-07T08:48:07Z)
Is margin all you need? An extensive empirical study of active learning on tabular data [66.18464006872345]
We analyze the performance of a variety of active learning algorithms on 69 real-world datasets from the OpenML-CC18 benchmark. Surprisingly, we find that the classical margin sampling technique matches or outperforms all others, including current state-of-art.
arXiv Detail & Related papers (2022-10-07T21:18:24Z)
Why do tree-based models still outperform deep learning on tabular data? [0.0]
We show that tree-based models remain state-of-the-art on medium-sized data. We conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs)
arXiv Detail & Related papers (2022-07-18T08:36:08Z)
Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models. We propose a realistic medical diagnosis benchmark for tabular transfer learning. We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.