Tabular Few-Shot Generalization Across Heterogeneous Feature Spaces
- URL: http://arxiv.org/abs/2311.10051v1
- Date: Thu, 16 Nov 2023 17:45:59 GMT
- Title: Tabular Few-Shot Generalization Across Heterogeneous Feature Spaces
- Authors: Max Zhu, Katarzyna Kobalczyk, Andrija Petrovic, Mladen Nikolic,
Mihaela van der Schaar, Boris Delibasic, Petro Lio
- Abstract summary: We propose a novel approach to few-shot learning involving knowledge sharing between datasets with heterogeneous feature spaces.
FLAT learns low-dimensional embeddings of datasets and their individual columns, which facilitate knowledge transfer and generalization to previously unseen datasets.
A decoder network parametrizes the predictive target network, implemented as a Graph Attention Network, to accommodate the heterogeneous nature of tabular datasets.
- Score: 43.67453625260335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the prevalence of tabular datasets, few-shot learning remains
under-explored within this domain. Existing few-shot methods are not directly
applicable to tabular datasets due to varying column relationships, meanings,
and permutational invariance. To address these challenges, we propose FLAT-a
novel approach to tabular few-shot learning, encompassing knowledge sharing
between datasets with heterogeneous feature spaces. Utilizing an encoder
inspired by Dataset2Vec, FLAT learns low-dimensional embeddings of datasets and
their individual columns, which facilitate knowledge transfer and
generalization to previously unseen datasets. A decoder network parametrizes
the predictive target network, implemented as a Graph Attention Network, to
accommodate the heterogeneous nature of tabular datasets. Experiments on a
diverse collection of 118 UCI datasets demonstrate FLAT's successful
generalization to new tabular datasets and a considerable improvement over the
baselines.
Related papers
- LaTable: Towards Large Tabular Models [63.995130144110156]
Tabular generative foundation models are hard to build due to the heterogeneous feature spaces of different datasets.
LaTable is a novel diffusion model that addresses these challenges and can be trained across different datasets.
We find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples.
arXiv Detail & Related papers (2024-06-25T16:03:50Z) - Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data [35.61663559675556]
Cross-dataset pretraining has shown notable success in various fields.
In this study, we introduce a cross-table pretrained Transformer, XTFormer, for versatile downstream tabular prediction tasks.
Our methodology is pretraining XTFormer to establish a "meta-function" space that encompasses all potential feature-target mappings.
arXiv Detail & Related papers (2024-06-01T03:24:31Z) - UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria.
We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets.
We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z) - Training-Free Generalization on Heterogeneous Tabular Data via
Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)
A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences.
Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z) - TablEye: Seeing small Tables through the Lens of Images [1.4398570436349933]
We propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation.
This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge.
TalEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.
arXiv Detail & Related papers (2023-07-04T02:45:59Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - SubTab: Subsetting Features of Tabular Data for Self-Supervised
Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab)
In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab)
We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.