Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
- URL: http://arxiv.org/abs/2511.10919v1
- Date: Fri, 14 Nov 2025 03:15:31 GMT
- Title: Heterogeneous Multisource Transfer Learning via Model Averaging for Positive-Unlabeled Data
- Authors: Jialei Liu, Jun Liao, Kuangnan Fang,
- Abstract summary: We propose a novel transfer learning framework that integrates information from heterogeneous data sources without direct data sharing.<n>For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging.<n>Our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.
- Score: 2.030810815519794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy constraints, we propose a novel transfer learning with model averaging framework that integrates information from heterogeneous data sources - including fully binary labeled, semi-supervised, and PU data sets - without direct data sharing. For each source domain type, a tailored logistic regression model is conducted, and knowledge is transferred to the PU target domain through model averaging. Optimal weights for combining source models are determined via a cross-validation criterion that minimizes the Kullback-Leibler divergence. We establish theoretical guarantees for weight optimality and convergence, covering both misspecified and correctly specified target models, with further extensions to high-dimensional settings using sparsity-penalized estimators. Extensive simulations and real-world credit risk data analyses demonstrate that our method outperforms other comparative methods in terms of predictive accuracy and robustness, especially under limited labeled data and heterogeneous environments.
Related papers
- Minimax optimal adaptive structured transfer learning through semi-parametric domain-varying coefficient model [9.091986429838117]
We study a multi-source, single-target transfer learning problem under conditional distributional drift.<n>We develop an adaptive transfer learning estimator that selectively borrows strength from informative source domains.
arXiv Detail & Related papers (2026-02-20T03:53:06Z) - Transfer Learning Through Conditional Quantile Matching [3.86972243789112]
We introduce a transfer learning framework for regression that leverages heterogeneous source domains to improve predictive performance in a data-scarce target domain.<n>Our approach learns a conditional generative model separately for each source domain and calibrates the generated responses to the target domain via conditional quantile matching.
arXiv Detail & Related papers (2026-02-02T17:19:55Z) - AdapDISCOM: An Adaptive Sparse Regression Method for High-Dimensional Multimodal Data With Block-Wise Missingness and Measurement Errors [0.06633699479109359]
AdapDISCOM is a novel adaptive direct sparse regression method.<n>We show that AdapDISCOM consistently outperforms DISCOM, SCOM, and CoCoLasso.<n>We apply AdapDISCOM to Alzheimers Disease Neuroimaging Initiative (ADNI) data, demonstrating improved prediction of cognitive scores and reliable selection of established biomarkers.
arXiv Detail & Related papers (2025-07-31T19:16:48Z) - Distributionally Robust Optimization with Adversarial Data Contamination [49.89480853499918]
We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions.<n>Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts.<n>This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
arXiv Detail & Related papers (2025-07-14T18:34:10Z) - Statistical Analysis of Conditional Group Distributionally Robust Optimization with Cross-Entropy Loss [16.1456465253627]
We study multi-source unsupervised domain adaptation, where labeled data are available from multiple source domains and only unlabeled data are observed from the target domain.<n>We propose a novel Group Distributionally Conditional Optimization framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from sources domains.<n>We establish fast statistical convergence rates for the empirical CG-DRO estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges.
arXiv Detail & Related papers (2025-07-14T04:21:23Z) - Robust Molecular Property Prediction via Densifying Scarce Labeled Data [53.24886143129006]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel bilevel optimization approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.
arXiv Detail & Related papers (2025-06-13T15:27:40Z) - Assumption-Lean Post-Integrated Inference with Surrogate Control Outcomes [6.448728765953916]
We introduce a robust post-integrated inference (PII) method that adjusts for latent heterogeneity using control outcomes.<n>We develop semiparametric inference on projected direct effect estimands, accounting for hidden mediators, confounders, and moderators.<n>The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification.
arXiv Detail & Related papers (2024-10-07T12:52:38Z) - The Risk of Federated Learning to Skew Fine-Tuning Features and
Underperform Out-of-Distribution Robustness [50.52507648690234]
Federated learning has the risk of skewing fine-tuning features and compromising the robustness of the model.
We introduce three robustness indicators and conduct experiments across diverse robust datasets.
Our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods.
arXiv Detail & Related papers (2024-01-25T09:18:51Z) - Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation [9.359714425373616]
Empirical risk often performs poorly when the distribution of the target domain differs from those of source domains.<n>We develop an unsupervised domain adaptation approach that leverages labeled data from multiple source domains and unlabeled data from the target domain.
arXiv Detail & Related papers (2023-09-05T13:19:40Z) - On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain
Adaptation [6.2200089460762085]
Methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models.
We develop an information-theoretic bound on the generalization error of the resulting target model.
We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment.
arXiv Detail & Related papers (2022-02-01T22:34:18Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Decomposed Adversarial Learned Inference [118.27187231452852]
We propose a novel approach, Decomposed Adversarial Learned Inference (DALI)
DALI explicitly matches prior and conditional distributions in both data and code spaces.
We validate the effectiveness of DALI on the MNIST, CIFAR-10, and CelebA datasets.
arXiv Detail & Related papers (2020-04-21T20:00:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.