A Theoretical Framework for Data Efficient Multi-Source Transfer Learning Based on Cramér-Rao Bound
- URL: http://arxiv.org/abs/2502.04242v1
- Date: Thu, 06 Feb 2025 17:32:49 GMT
- Title: A Theoretical Framework for Data Efficient Multi-Source Transfer Learning Based on Cramér-Rao Bound
- Authors: Qingyue Zhang, Haohao Fu, Guanbo Huang, Yaoyuan Liang, Chang Chu, Tianren Peng, Yanru Wu, Qi Li, Yang Li, Shao-Lun Huang,
- Abstract summary: We propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model?
Specifically, we introduce a generalization error measure that aligns with cross-entropy loss, and minimize it based on the Cram'er-Rao Bound to determine the optimal transfer quantity for each source task.
We develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for training deep multi-source transfer learning models.
- Score: 16.49737340580437
- License:
- Abstract: Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works typically use all available samples from sources in training, which constrains their training efficiency and may lead to suboptimal results. To address this, we propose a theoretical framework that answers the question: what is the optimal quantity of source samples needed from each source task to jointly train the target model? Specifically, we introduce a generalization error measure that aligns with cross-entropy loss, and minimize it based on the Cram\'er-Rao Bound to determine the optimal transfer quantity for each source task. Additionally, we develop an architecture-agnostic and data-efficient algorithm OTQMS to implement our theoretical results for training deep multi-source transfer learning models. Experimental studies on diverse architectures and two real-world benchmark datasets show that our proposed algorithm significantly outperforms state-of-the-art approaches in both accuracy and data efficiency. The code and supplementary materials are available in https://anonymous.4open.science/r/Materials.
Related papers
- Transfer Learning with Reconstruction Loss [12.906500431427716]
This paper proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss.
The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning.
For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization.
arXiv Detail & Related papers (2024-03-31T00:22:36Z) - Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model [18.58663937035378]
We address the combined challenge of learning long-tailed distributions using highly resource-efficient binary neural networks as backbones.
We propose a calibrate-and-distill framework that uses off-the-shelf pretrained full-precision models trained on balanced datasets to use as teachers for distillation.
To better generalize to various datasets, we propose a novel adversarial balancing among the terms in the objective function and an efficient multiresolution learning scheme.
arXiv Detail & Related papers (2024-03-30T08:37:19Z) - H-ensemble: An Information Theoretic Approach to Reliable Few-Shot
Multi-Source-Free Transfer [4.328706834250445]
We propose a framework named H-ensemble, which learns the optimal linear combination of source models for the target task.
Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt.
We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.
arXiv Detail & Related papers (2023-12-19T17:39:34Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Quality Not Quantity: On the Interaction between Dataset Design and
Robustness of CLIP [43.7219097444333]
We introduce a testbed of six publicly available data sources to investigate how pre-training distributions induce robustness in CLIP.
We find that the performance of the pre-training data varies substantially across distribution shifts.
We find that combining multiple sources does not necessarily yield better models, but rather dilutes the robustness of the best individual data source.
arXiv Detail & Related papers (2022-08-10T18:24:23Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Unsupervised Multi-source Domain Adaptation Without Access to Source
Data [58.551861130011886]
Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain.
We propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model.
arXiv Detail & Related papers (2021-04-05T10:45:12Z) - Learning Centric Wireless Resource Allocation for Edge Computing:
Algorithm and Experiment [15.577056429740951]
Edge intelligence is an emerging network architecture that integrates sensing, communication, computing components, and supports various machine learning applications.
Existing methods ignore two important facts: 1) different models have heterogeneous demands on training data; 2) there is a mismatch between the simulated environment and the real-world environment.
This paper proposes the learning centric wireless resource allocation scheme that maximizes the worst learning performance of multiple tasks.
arXiv Detail & Related papers (2020-10-29T06:20:40Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - XMixup: Efficient Transfer Learning with Auxiliary Samples by
Cross-domain Mixup [60.07531696857743]
Cross-domain Mixup (XMixup) improves the multitask paradigm for deep transfer learning.
XMixup selects the auxiliary samples from the source dataset and augments training samples via the simple mixup strategy.
Experiment results show that XMixup improves the accuracy by 1.9% on average.
arXiv Detail & Related papers (2020-07-20T16:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.