Related papers: APAR: Modeling Irregular Target Functions in Tabular Regression via Arithmetic-Aware Pre-Training and Adaptive-Regularized Fine-Tuning

APAR: Modeling Irregular Target Functions in Tabular Regression via Arithmetic-Aware Pre-Training and Adaptive-Regularized Fine-Tuning

URL: http://arxiv.org/abs/2412.10941v1
Date: Sat, 14 Dec 2024 19:33:21 GMT
Title: APAR: Modeling Irregular Target Functions in Tabular Regression via Arithmetic-Aware Pre-Training and Adaptive-Regularized Fine-Tuning
Authors: Hong-Wei Wu, Wei-Yao Wang, Kuang-Da Wang, Wen-Chih Peng,
Abstract summary: We propose a novel Arithmetic-Aware Pre-training and Adaptive-Regularized Fine-tuning framework (APAR)<n>In the pre-training phase, APAR introduces an arithmetic-aware pretext objective to capture intricate sample-wise relationships from the perspective of continuous labels.<n>In the fine-tuning phase, a consistency-based adaptive regularization technique is proposed to self-learn appropriate data augmentation.
Score: 12.35924469567586
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tabular data are fundamental in common machine learning applications, ranging from finance to genomics and healthcare. This paper focuses on tabular regression tasks, a field where deep learning (DL) methods are not consistently superior to machine learning (ML) models due to the challenges posed by irregular target functions inherent in tabular data, causing sensitive label changes with minor variations from features. To address these issues, we propose a novel Arithmetic-Aware Pre-training and Adaptive-Regularized Fine-tuning framework (APAR), which enables the model to fit irregular target function in tabular data while reducing the negative impact of overfitting. In the pre-training phase, APAR introduces an arithmetic-aware pretext objective to capture intricate sample-wise relationships from the perspective of continuous labels. In the fine-tuning phase, a consistency-based adaptive regularization technique is proposed to self-learn appropriate data augmentation. Extensive experiments across 10 datasets demonstrated that APAR outperforms existing GBDT-, supervised NN-, and pretrain-finetune NN-based methods in RMSE (+9.43% $\sim$ 20.37%), and empirically validated the effects of pre-training tasks, including the study of arithmetic operations. Our code and data are publicly available at https://github.com/johnnyhwu/APAR.

Related papers

InstructDiff: Domain-Adaptive Data Selection via Differential Entropy for Efficient LLM Fine-Tuning [35.89674702985539]
InstructDiff is a unified framework that operationalizes differential entropy as a domain-adaptive selection criterion.<n>We show that InstructDiff achieves 17% relative improvement over full data training on mathematical reasoning and 52% for general instruction-following.
arXiv Detail & Related papers (2026-01-30T14:15:44Z)
Disentangled Deep Smoothed Bootstrap for Fair Imbalanced Regression [1.2289361708127877]
Imbalanced distribution learning is a common and significant challenge in predictive modeling, often reducing the performance of standard algorithms.<n>We propose using Variational Autoencoders (VAEs) to model and define a latent representation of data distributions.<n>To address this, we develop an innovative data generation method that combines a disentangled VAE with a Smoothed Bootstrap applied in the latent space.
arXiv Detail & Related papers (2025-08-19T13:40:04Z)
DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
Zero-shot Meta-learning for Tabular Prediction Tasks with Adversarially Pre-trained Transformer [2.1677183904102257]
We present an Adversarially Pre-trained Transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without pre-training on any real-world dataset. APT is pre-trained with adversarial synthetic data agents, who deliberately challenge the model with different synthetic datasets. We show that our framework matches state-of-the-art performance on small classification tasks without filtering on dataset characteristics.
arXiv Detail & Related papers (2025-02-06T23:58:11Z)
Retrieval Instead of Fine-tuning: A Retrieval-based Parameter Ensemble for Zero-shot Learning [22.748835458594744]
We introduce Retrieval-based. Ensemble (RPE), a new method that creates a vectorized database of. Low-Rank Adaptations (LoRAs) RPE minimizes the need for extensive training and eliminates the requirement for labeled data, making it particularly effective for zero-shot learning. RPE is well-suited for privacy-sensitive domains like healthcare, as it modifies model parameters without accessing raw data.
arXiv Detail & Related papers (2024-10-13T16:28:38Z)
Distributionally robust self-supervised learning for tabular data [2.942619386779508]
Learning robust representation in presence of error slices is challenging, due to high cardinality features and the complexity of constructing error sets.<n>Traditional robust representation learning methods are largely focused on improving worst group performance in supervised setting in computer vision.<n>Our approach utilizes an encoder-decoder model trained with Masked Language Modeling (MLM) loss to learn robust latent representations.
arXiv Detail & Related papers (2024-10-11T04:23:56Z)
Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning [17.38478179159257]
We create a synthetic dataset with a mild feature interaction assumption. We examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer.
arXiv Detail & Related papers (2024-02-04T04:07:39Z)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT) We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences. We formulate each task as a sequence-to-sequence problem and perform multi-task training. We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time. We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z)
Diversity Enhanced Active Learning with Strictly Proper Scoring Rules [4.81450893955064]
We study acquisition functions for active learning (AL) for text classification. We convert the Expected Loss Reduction (ELR) method to estimate the increase in (strictly proper) scores like log probability or negative mean square error. We show that the use of mean square error and log probability with BEMPS yields robust acquisition functions.
arXiv Detail & Related papers (2021-10-27T05:02:11Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.