ReConTab: Regularized Contrastive Representation Learning for Tabular
Data
- URL: http://arxiv.org/abs/2310.18541v2
- Date: Mon, 18 Dec 2023 15:41:50 GMT
- Title: ReConTab: Regularized Contrastive Representation Learning for Tabular
Data
- Authors: Suiyao Chen, Jing Wu, Naira Hovakimyan, Handong Yao
- Abstract summary: We introduce ReConTab, a deep automatic representation learning framework with regularized contrastive learning.
Agnostic to any type of modeling task, ReConTab constructs an asymmetric autoencoder based on the same raw features from model inputs.
Experiments conducted on extensive real-world datasets substantiate the framework's capacity to yield substantial and robust performance improvements.
- Score: 8.178223284255791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representation learning stands as one of the critical machine learning
techniques across various domains. Through the acquisition of high-quality
features, pre-trained embeddings significantly reduce input space redundancy,
benefiting downstream pattern recognition tasks such as classification,
regression, or detection. Nonetheless, in the domain of tabular data, feature
engineering and selection still heavily rely on manual intervention, leading to
time-consuming processes and necessitating domain expertise. In response to
this challenge, we introduce ReConTab, a deep automatic representation learning
framework with regularized contrastive learning. Agnostic to any type of
modeling task, ReConTab constructs an asymmetric autoencoder based on the same
raw features from model inputs, producing low-dimensional representative
embeddings. Specifically, regularization techniques are applied for raw feature
selection. Meanwhile, ReConTab leverages contrastive learning to distill the
most pertinent information for downstream tasks. Experiments conducted on
extensive real-world datasets substantiate the framework's capacity to yield
substantial and robust performance improvements. Furthermore, we empirically
demonstrate that pre-trained embeddings can seamlessly integrate as easily
adaptable features, enhancing the performance of various traditional methods
such as XGBoost and Random Forest.
Related papers
- TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data [5.98480077860174]
We introduce TabDeco, a novel method that leverages attention-based encoding strategies across both rows and columns.
With the innovative feature decoupling hierarchies, TabDeco consistently surpasses existing deep learning methods.
arXiv Detail & Related papers (2024-11-17T18:42:46Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains [0.565395466029518]
We propose a novel pretext task based on the classical binning method.
The idea is straightforward: reconstructing the bin indices (either orders or classes) rather than the original values.
Our empirical investigations ascertain several advantages of binning.
arXiv Detail & Related papers (2024-05-13T01:23:14Z) - SwitchTab: Switched Autoencoders Are Effective Tabular Learners [16.316153704284936]
We introduce SwitchTab, a novel self-supervised representation method for tabular data.
SwitchTab captures latent dependencies by decouples mutual and salient features among data pairs.
Results show superior performance in end-to-end prediction tasks with fine-tuning.
We highlight the capability of SwitchTab to create explainable representations through visualization of decoupled mutual and salient features in the latent space.
arXiv Detail & Related papers (2024-01-04T01:05:45Z) - Unlocking the Transferability of Tokens in Deep Models for Tabular Data [67.11727608815636]
Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks.
In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens.
We introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features.
arXiv Detail & Related papers (2023-10-23T17:53:09Z) - Unsupervised 3D registration through optimization-guided cyclical
self-training [71.75057371518093]
State-of-the-art deep learning-based registration methods employ three different learning strategies.
We propose a novel self-supervised learning paradigm for unsupervised registration, relying on self-training.
We evaluate the method for abdomen and lung registration, consistently surpassing metric-based supervision and outperforming diverse state-of-the-art competitors.
arXiv Detail & Related papers (2023-06-29T14:54:10Z) - Complementary Learning Subnetworks for Parameter-Efficient
Class-Incremental Learning [40.13416912075668]
We propose a rehearsal-free CIL approach that learns continually via the synergy between two Complementary Learning Subnetworks.
Our method achieves competitive results against state-of-the-art methods, especially in accuracy gain, memory cost, training efficiency, and task-order.
arXiv Detail & Related papers (2023-06-21T01:43:25Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - Uniform Priors for Data-Efficient Transfer [65.086680950871]
We show that features that are most transferable have high uniformity in the embedding space.
We evaluate the regularization on its ability to facilitate adaptation to unseen tasks and data.
arXiv Detail & Related papers (2020-06-30T04:39:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.