MotherNet: A Foundational Hypernetwork for Tabular Classification
- URL: http://arxiv.org/abs/2312.08598v1
- Date: Thu, 14 Dec 2023 01:48:58 GMT
- Title: MotherNet: A Foundational Hypernetwork for Tabular Classification
- Authors: Andreas M\"uller, Carlo Curino, Raghu Ramakrishnan
- Abstract summary: We propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks.
MotherNet replaces training on specific datasets with in-context learning through a single forward pass.
The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets.
- Score: 1.9643748953805937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advent of Foundation Models is transforming machine learning across many
modalities (e.g., language, images, videos) with prompt engineering replacing
training in many settings. Recent work on tabular data (e.g., TabPFN) hints at
a similar opportunity to build Foundation Models for classification for
numerical data. In this paper, we go one step further and propose a
hypernetwork architecture that we call MotherNet, trained on millions of
classification tasks, that, once prompted with a never-seen-before training set
generates the weights of a trained ``child'' neural-network. Like other
Foundation Models, MotherNet replaces training on specific datasets with
in-context learning through a single forward pass. In contrast to existing
hypernetworks that were either task-specific or trained for relatively
constraint multi-task settings, MotherNet is trained to generate networks to
perform multiclass classification on arbitrary tabular datasets without any
dataset specific gradient descent.
The child network generated by MotherNet using in-context learning
outperforms neural networks trained using gradient descent on small datasets,
and is competitive with predictions by TabPFN and standard ML methods like
Gradient Boosting. Unlike a direct application of transformer models like
TabPFN, MotherNet generated networks are highly efficient at inference time.
This methodology opens up a new approach to building predictive models on
tabular data that is both efficient and robust, without any dataset-specific
training.
Related papers
- Bridging Neural Networks and Dynamic Time Warping for Adaptive Time Series Classification [2.443957114877221]
We develop a versatile model that adapts to cold-start conditions and becomes trainable with labeled data.<n>As a neural network, it becomes trainable when sufficient labeled data is available, while still retaining DTW's inherent interpretability.
arXiv Detail & Related papers (2025-07-13T23:15:21Z) - Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - A Pipeline of Augmentation and Sequence Embedding for Classification of Imbalanced Network Traffic [0.0]
We propose a pipeline to balance the dataset and classify it using a robust and accurate embedding technique.<n>We demonstrate that the proposed augmentation pipeline, combined with FS-Embedding, increases convergence speed and leads to a significant reduction in the number of model parameters.
arXiv Detail & Related papers (2025-02-26T07:55:24Z) - Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
This data selector samples a subset of the most important samples from each skill cluster for training.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes.
To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size.
This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z) - StitchNet: Composing Neural Networks from Pre-Trained Fragments [3.638431342539701]
We propose StitchNet, a novel neural network creation paradigm.
It stitches together fragments from multiple pre-trained neural networks.
We show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks.
arXiv Detail & Related papers (2023-01-05T08:02:30Z) - Meta-learning Pathologies from Radiology Reports using Variance Aware
Prototypical Networks [3.464871689508835]
We propose a simple extension of the Prototypical Networks for few-shot text classification.
Our main idea is to replace the class prototypes by Gaussians and introduce a regularization term that encourages the examples to be clustered near the appropriate class centroids.
arXiv Detail & Related papers (2022-10-22T05:22:29Z) - A new hope for network model generalization [66.5377859849467]
Generalizing machine learning models for network traffic dynamics tends to be considered a lost cause.
An ML architecture called_Transformer_ has enabled previously unimaginable generalization in other domains.
We propose a Network Traffic Transformer (NTT) to learn network dynamics from packet traces.
arXiv Detail & Related papers (2022-07-12T21:16:38Z) - TabPFN: A Transformer That Solves Small Tabular Classification Problems
in a Second [48.87527918630822]
We present TabPFN, a trained Transformer that can do supervised classification for small datasets in less than a second.
TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples.
We show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 230$times$ speedup.
arXiv Detail & Related papers (2022-07-05T07:17:43Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Task-Adaptive Neural Network Retrieval with Meta-Contrastive Learning [34.27089256930098]
We propose a novel neural network retrieval method, which retrieves the most optimal pre-trained network for a given task.
We train this framework by meta-learning a cross-modal latent space with contrastive loss, to maximize the similarity between a dataset and a network.
We validate the efficacy of our method on ten real-world datasets, against existing NAS baselines.
arXiv Detail & Related papers (2021-03-02T06:30:51Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with
Autotuned Data-Parallel Training for Tabular Data [11.552769149674544]
Development of high-performing predictive models for large data sets is a challenging task.
Recent automated machine learning (AutoML) is emerging as a promising approach to automate predictive model development.
We have developed AgEBO-Tabular, an approach to combine aging evolution (AgE) and a parallel NAS method that searches over neural architecture space.
arXiv Detail & Related papers (2020-10-30T16:28:48Z) - The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training.
In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy.
We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Learning across label confidence distributions using Filtered Transfer
Learning [0.44040106718326594]
We propose a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets.
We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks.
We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range.
arXiv Detail & Related papers (2020-06-03T21:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.