TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data
- URL: http://arxiv.org/abs/2411.11148v1
- Date: Sun, 17 Nov 2024 18:42:46 GMT
- Title: TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data
- Authors: Suiyao Chen, Jing Wu, Yunxiao Wang, Cheng Ji, Tianpei Xie, Daniel Cociorva, Michael Sharps, Cecile Levasseur, Hakan Brunzell,
- Abstract summary: We introduce TabDeco, a novel method that leverages attention-based encoding strategies across both rows and columns.
With the innovative feature decoupling hierarchies, TabDeco consistently surpasses existing deep learning methods.
- Score: 5.98480077860174
- License:
- Abstract: Representation learning is a fundamental aspect of modern artificial intelligence, driving substantial improvements across diverse applications. While selfsupervised contrastive learning has led to significant advancements in fields like computer vision and natural language processing, its adaptation to tabular data presents unique challenges. Traditional approaches often prioritize optimizing model architecture and loss functions but may overlook the crucial task of constructing meaningful positive and negative sample pairs from various perspectives like feature interactions, instance-level patterns and batch-specific contexts. To address these challenges, we introduce TabDeco, a novel method that leverages attention-based encoding strategies across both rows and columns and employs contrastive learning framework to effectively disentangle feature representations at multiple levels, including features, instances and data batches. With the innovative feature decoupling hierarchies, TabDeco consistently surpasses existing deep learning methods and leading gradient boosting algorithms, including XG-Boost, CatBoost, and LightGBM, across various benchmark tasks, underscoring its effectiveness in advancing tabular data representation learning.
Related papers
- TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering [5.946579489162407]
This work introduces TabSeq, a novel framework for the sequential ordering of features.
Finding the optimum sequence order for such features could improve the deep learning models' learning process.
arXiv Detail & Related papers (2024-10-17T04:10:36Z) - Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - Scalable Representation Learning for Multimodal Tabular Transactions [14.18267117657451]
We present an innovative and scalable solution to these challenges.
We propose a parameter efficient decoder that interleaves transaction and text modalities.
We validate the efficacy of our solution on a large-scale dataset of synthetic payments transactions.
arXiv Detail & Related papers (2024-10-10T12:18:42Z) - Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
CoGCL aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.
We introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes.
For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors.
Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - SwitchTab: Switched Autoencoders Are Effective Tabular Learners [16.316153704284936]
We introduce SwitchTab, a novel self-supervised representation method for tabular data.
SwitchTab captures latent dependencies by decouples mutual and salient features among data pairs.
Results show superior performance in end-to-end prediction tasks with fine-tuning.
We highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.
arXiv Detail & Related papers (2024-01-04T01:05:45Z) - ReConTab: Regularized Contrastive Representation Learning for Tabular
Data [8.178223284255791]
We introduce ReConTab, a deep automatic representation learning framework with regularized contrastive learning.
Agnostic to any type of modeling task, ReConTab constructs an asymmetric autoencoder based on the same raw features from model inputs.
Experiments conducted on extensive real-world datasets substantiate the framework's capacity to yield substantial and robust performance improvements.
arXiv Detail & Related papers (2023-10-28T00:05:28Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - Dual Path Structural Contrastive Embeddings for Learning Novel Objects [6.979491536753043]
Recent research shows that gaining information on a good feature space can be an effective solution to achieve favorable performance on few-shot tasks.
We propose a simple but effective paradigm that decouples the tasks of learning feature representations and classifiers.
Our method can still achieve promising results for both standard and generalized few-shot problems in either an inductive or transductive inference setting.
arXiv Detail & Related papers (2021-12-23T04:43:31Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.