Adaptive Sample Sharing for Linear Regression
- URL: http://arxiv.org/abs/2510.16986v1
- Date: Sun, 19 Oct 2025 20:03:48 GMT
- Title: Adaptive Sample Sharing for Linear Regression
- Authors: Hamza Cherkaoui, Hélène Halconruy, Yohan Petetin,
- Abstract summary: We study sample sharing in the case of ridge regression.<n>We introduce a principled, data-driven rule that decides how many samples from an auxiliary dataset to add to the target training set.<n>We validate the approach in synthetic and real datasets, observing consistent gains over strong baselines and single-task training.
- Score: 1.8898307337832196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many business settings, task-specific labeled data are scarce or costly to obtain, which limits supervised learning on a specific task. To address this challenge, we study sample sharing in the case of ridge regression: leveraging an auxiliary data set while explicitly protecting against negative transfer. We introduce a principled, data-driven rule that decides how many samples from an auxiliary dataset to add to the target training set. The rule is based on an estimate of the transfer gain i.e. the marginal reduction in the predictive error. Building on this estimator, we derive finite-sample guaranties: under standard conditions, the procedure borrows when it improves parameter estimation and abstains otherwise. In the Gaussian feature setting, we analyze which data set properties ensure that borrowing samples reduces the predictive error. We validate the approach in synthetic and real datasets, observing consistent gains over strong baselines and single-task training while avoiding negative transfer.
Related papers
- Rethinking Remaining Useful Life Prediction with Scarce Time Series Data: Regression under Indirect Supervision [4.335413713700667]
We introduce a unified framework called parameterized static regression, which takes single points as inputs for regression of target values.<n>Our method demonstrates competitive performance in prediction accuracy when dealing with highly scarce time series data.
arXiv Detail & Related papers (2025-04-12T13:14:35Z) - Task Shift: From Classification to Regression in Overparameterized Linear Models [5.030445392527011]
We investigate a phenomenon where latent knowledge is transferred to a more difficult task under a similar data distribution.<n>We show that while minimum-norm interpolators for classification cannot transfer to regression a priori, they experience surprisingly structured attenuation which enables successful task shift with limited additional data.<n>Our results show that while minimum-norm interpolators for classification cannot transfer to regression a priori, they experience surprisingly structured attenuation which enables successful task shift with limited additional data.
arXiv Detail & Related papers (2025-02-18T21:16:01Z) - Adapt then Unlearn: Exploring Parameter Space Semantics for Unlearning in Generative Adversarial Networks [5.107720313575234]
This work aims to prevent the generation of outputs containing undesired features from a pre-trained Generative Adversarial Network (GAN)<n>Our proposed two-stage method, known as 'Adapt-then-Unlearn,' excels at unlearning such undesirable features while also maintaining the quality of generated samples.<n>To the best of our knowledge, our approach stands as the first method addressing unlearning within the realm of high-fidelity GANs.
arXiv Detail & Related papers (2023-09-25T11:36:20Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Breaking the Spurious Causality of Conditional Generation via Fairness
Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset.
This can result in label-conditional distributions that are imbalanced with respect to another latent attribute.
We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z) - Robust Fairness under Covariate Shift [11.151913007808927]
Making predictions that are fair with regard to protected group membership has become an important requirement for classification algorithms.
We propose an approach that obtains the predictor that is robust to the worst-case in terms of target performance.
arXiv Detail & Related papers (2020-10-11T04:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.