TablEye: Seeing small Tables through the Lens of Images
- URL: http://arxiv.org/abs/2307.02491v1
- Date: Tue, 4 Jul 2023 02:45:59 GMT
- Title: TablEye: Seeing small Tables through the Lens of Images
- Authors: Seung-eon Lee and Sang-Chul Lee
- Abstract summary: We propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation.
This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge.
TalEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.
- Score: 1.4398570436349933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The exploration of few-shot tabular learning becomes imperative. Tabular data
is a versatile representation that captures diverse information, yet it is not
exempt from limitations, property of data and model size. Labeling extensive
tabular data can be challenging, and it may not be feasible to capture every
important feature. Few-shot tabular learning, however, remains relatively
unexplored, primarily due to scarcity of shared information among independent
datasets and the inherent ambiguity in defining boundaries within tabular data.
To the best of our knowledge, no meaningful and unrestricted few-shot tabular
learning techniques have been developed without imposing constraints on the
dataset. In this paper, we propose an innovative framework called TablEye,
which aims to overcome the limit of forming prior knowledge for tabular data by
adopting domain transformation. It facilitates domain transformation by
generating tabular images, which effectively conserve the intrinsic semantics
of the original tabular data. This approach harnesses rigorously tested
few-shot learning algorithms and embedding functions to acquire and apply prior
knowledge. Leveraging shared data domains allows us to utilize this prior
knowledge, originally learned from the image domain. Specifically, TablEye
demonstrated a superior performance by outstripping the TabLLM in a 4-shot task
with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on
average by 3.17% accuracy.
Related papers
- TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data [6.414759311130015]
We propose TIP, a novel framework for learning multimodal representations robust to incomplete data.
Specifically, TIP investigates a self-supervised learning (SSL) strategy, including a masked reconstruction task for tackling data missingness.
TIP outperforms state-of-the-art supervised/SSL image/multimodal algorithms in both complete and incomplete data scenarios.
arXiv Detail & Related papers (2024-07-10T12:16:15Z) - LaTable: Towards Large Tabular Models [63.995130144110156]
Tabular generative foundation models are hard to build due to the heterogeneous feature spaces of different datasets.
LaTable is a novel diffusion model that addresses these challenges and can be trained across different datasets.
We find that LaTable outperforms baselines on in-distribution generation, and that finetuning LaTable can generate out-of-distribution datasets better with fewer samples.
arXiv Detail & Related papers (2024-06-25T16:03:50Z) - Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Tabular Few-Shot Generalization Across Heterogeneous Feature Spaces [43.67453625260335]
We propose a novel approach to few-shot learning involving knowledge sharing between datasets with heterogeneous feature spaces.
FLAT learns low-dimensional embeddings of datasets and their individual columns, which facilitate knowledge transfer and generalization to previously unseen datasets.
A decoder network parametrizes the predictive target network, implemented as a Graph Attention Network, to accommodate the heterogeneous nature of tabular datasets.
arXiv Detail & Related papers (2023-11-16T17:45:59Z) - Training-Free Generalization on Heterogeneous Tabular Data via
Meta-Representation [67.30538142519067]
We propose Tabular data Pre-Training via Meta-representation (TabPTM)
A deep neural network is then trained to associate these meta-representations with dataset-specific classification confidences.
Experiments validate that TabPTM achieves promising performance in new datasets, even under few-shot scenarios.
arXiv Detail & Related papers (2023-10-31T18:03:54Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - STUNT: Few-shot Tabular Learning with Self-generated Tasks from
Unlabeled Tables [64.0903766169603]
We propose a framework for few-shot semi-supervised learning, coined Self-generated Tasks from UNlabeled Tables (STUNT)
Our key idea is to self-generate diverse few-shot tasks by treating randomly chosen columns as a target label.
We then employ a meta-learning scheme to learn generalizable knowledge with the constructed tasks.
arXiv Detail & Related papers (2023-03-02T02:37:54Z) - Progressive Feature Upgrade in Semi-supervised Learning on Tabular
Domain [0.0]
Recent semi-supervised and self-supervised methods have shown great success in the image and text domain.
It is not easy to adapt domain-specific transformations from image and language to tabular data due to mixing of different data types.
We propose using conditional probability representation and an efficient progressively feature upgrading framework.
arXiv Detail & Related papers (2022-12-01T22:18:32Z) - SubTab: Subsetting Features of Tabular Data for Self-Supervised
Representation Learning [5.5616364225463055]
We introduce a new framework, Subsetting features of Tabular data (SubTab)
In this paper, we introduce a new framework, Subsetting features of Tabular data (SubTab)
We argue that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying representation.
arXiv Detail & Related papers (2021-10-08T20:11:09Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.