Related papers: MotherNet: A Foundational Hypernetwork for Tabular Classification

MotherNet: A Foundational Hypernetwork for Tabular Classification

URL: http://arxiv.org/abs/2312.08598v1
Date: Thu, 14 Dec 2023 01:48:58 GMT
Title: MotherNet: A Foundational Hypernetwork for Tabular Classification
Authors: Andreas M\"uller, Carlo Curino, Raghu Ramakrishnan
Abstract summary: We propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks. MotherNet replaces training on specific datasets with in-context learning through a single forward pass. The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets.
Score: 1.9643748953805937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training.

Related papers

Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning. We construct pseudo-skill clusters by grouping gradient-based sample vectors. We select the best-performing data selector for each skill cluster from a pool of selector experts. This data selector samples a subset of the most important samples from each skill cluster for training.
arXiv Detail & Related papers (2024-10-14T15:48:09Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters [19.358670728803336]
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes. To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size. This naive approach falls short in practice as it brings too much noise to the growing process.
arXiv Detail & Related papers (2023-11-07T11:37:08Z)
StitchNet: Composing Neural Networks from Pre-Trained Fragments [3.638431342539701]
We propose StitchNet, a novel neural network creation paradigm. It stitches together fragments from multiple pre-trained neural networks. We show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks.
arXiv Detail & Related papers (2023-01-05T08:02:30Z)
Meta-learning Pathologies from Radiology Reports using Variance Aware Prototypical Networks [3.464871689508835]
We propose a simple extension of the Prototypical Networks for few-shot text classification. Our main idea is to replace the class prototypes by Gaussians and introduce a regularization term that encourages the examples to be clustered near the appropriate class centroids.
arXiv Detail & Related papers (2022-10-22T05:22:29Z)
A new hope for network model generalization [66.5377859849467]
Generalizing machine learning models for network traffic dynamics tends to be considered a lost cause. An ML architecture called_Transformer_ has enabled previously unimaginable generalization in other domains. We propose a Network Traffic Transformer (NTT) to learn network dynamics from packet traces.
arXiv Detail & Related papers (2022-07-12T21:16:38Z)
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second [48.87527918630822]
We present TabPFN, a trained Transformer that can do supervised classification for small datasets in less than a second. TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples. We show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 230$times$ speedup.
arXiv Detail & Related papers (2022-07-05T07:17:43Z)
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers. We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks. The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z)
The Lottery Ticket Hypothesis for Pre-trained BERT Networks [137.99328302234338]
In natural language processing (NLP), enormous pre-trained models like BERT have become the standard starting point for training. In parallel, work on the lottery ticket hypothesis has shown that models for NLP and computer vision contain smaller matchingworks capable of training in isolation to full accuracy. We combine these observations to assess whether such trainable, transferrableworks exist in pre-trained BERT models.
arXiv Detail & Related papers (2020-07-23T19:35:39Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Learning across label confidence distributions using Filtered Transfer Learning [0.44040106718326594]
We propose a transfer learning approach to improve predictive power in noisy data systems with large variable confidence datasets. We propose a deep neural network method called Filtered Transfer Learning (FTL) that defines multiple tiers of data confidence as separate tasks. We demonstrate that using FTL to learn stepwise, across the label confidence distribution, results in higher performance compared to deep neural network models trained on a single confidence range.
arXiv Detail & Related papers (2020-06-03T21:00:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.