MotherNet: A Foundational Hypernetwork for Tabular Classification
- URL: http://arxiv.org/abs/2312.08598v1
- Date: Thu, 14 Dec 2023 01:48:58 GMT
- Title: MotherNet: A Foundational Hypernetwork for Tabular Classification
- Authors: Andreas M\"uller, Carlo Curino, Raghu Ramakrishnan
- Abstract summary: We propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks.
MotherNet replaces training on specific datasets with in-context learning through a single forward pass.
The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets.
- Score: 1.9643748953805937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Foundation Models is transforming machine learning across many
modalities (e.g., language, images, videos) with prompt engineering replacing
training in many settings. Recent work on tabular data (e.g., TabPFN) hints at
a similar opportunity to build Foundation Models for classification for
numerical data. In this paper, we go one step further and propose a
hypernetwork architecture that we call MotherNet, trained on millions of
classification tasks, that, once prompted with a never-seen-before training set
generates the weights of a trained ``child'' neural-network. Like other
Foundation Models, MotherNet replaces training on specific datasets with
in-context learning through a single forward pass. In contrast to existing
hypernetworks that were either task-specific or trained for relatively
constraint multi-task settings, MotherNet is trained to generate networks to
perform multiclass classification on arbitrary tabular datasets without any
dataset specific gradient descent.
The child network generated by MotherNet using in-context learning
outperforms neural networks trained using gradient descent on small datasets,
and is competitive with predictions by TabPFN and standard ML methods like
Gradient Boosting. Unlike a direct application of transformer models like
TabPFN, MotherNet generated networks are highly efficient at inference time.
This methodology opens up a new approach to building predictive models on
tabular data that is both efficient and robust, without any dataset-specific
training.
Related papers
- Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes.
To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size.
This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z) - StitchNet: Composing Neural Networks from Pre-Trained Fragments [3.638431342539701]
We propose StitchNet, a novel neural network creation paradigm.
It stitches together fragments from multiple pre-trained neural networks.
We show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks.
arXiv Detail & Related papers (2023-01-05T08:02:30Z) - Meta-learning Pathologies from Radiology Reports using Variance Aware
Prototypical Networks [3.464871689508835]
We propose a simple extension of the Prototypical Networks for few-shot text classification.
Our main idea is to replace the class prototypes by Gaussians and introduce a regularization term that encourages the examples to be clustered near the appropriate class centroids.
arXiv Detail & Related papers (2022-10-22T05:22:29Z) - A new hope for network model generalization [66.5377859849467]
Generalizing machine learning models for network traffic dynamics tends to be considered a lost cause.
An ML architecture called_Transformer_ has enabled previously unimaginable generalization in other domains.
We propose a Network Traffic Transformer (NTT) to learn network dynamics from packet traces.
arXiv Detail & Related papers (2022-07-12T21:16:38Z) - TabPFN: A Transformer That Solves Small Tabular Classification Problems
in a Second [48.87527918630822]
We present TabPFN, a trained Transformer that can do supervised classification for small datasets in less than a second.
TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples.
We show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 230$times$ speedup.
arXiv Detail & Related papers (2022-07-05T07:17:43Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Learning across label confidence distributions using Filtered Transfer
Learning [0.44040106718326594]
We propose a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets.
We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks.
We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range.
arXiv Detail & Related papers (2020-06-03T21:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.