Deep Transfer Learning: Model Framework and Error Analysis
- URL: http://arxiv.org/abs/2410.09383v1
- Date: Sat, 12 Oct 2024 06:24:35 GMT
- Title: Deep Transfer Learning: Model Framework and Error Analysis
- Authors: Yuling Jiao, Huazhen Lin, Yuchen Luo, Jerry Zhijian Yang,
- Abstract summary: This paper presents a framework for deep transfer learning with a large number of samples $n$ to a single-domain downstream task.
We show that the transfer under our framework can significantly improve the convergence rate for learning Lipschitz functions in downstream supervised tasks.
- Score: 4.898032902660655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a framework for deep transfer learning, which aims to leverage information from multi-domain upstream data with a large number of samples $n$ to a single-domain downstream task with a considerably smaller number of samples $m$, where $m \ll n$, in order to enhance performance on downstream task. Our framework has several intriguing features. First, it allows the existence of both shared and specific features among multi-domain data and provides a framework for automatic identification, achieving precise transfer and utilization of information. Second, our model framework explicitly indicates the upstream features that contribute to downstream tasks, establishing a relationship between upstream domains and downstream tasks, thereby enhancing interpretability. Error analysis demonstrates that the transfer under our framework can significantly improve the convergence rate for learning Lipschitz functions in downstream supervised tasks, reducing it from $\tilde{O}(m^{-\frac{1}{2(d+2)}}+n^{-\frac{1}{2(d+2)}})$ ("no transfer") to $\tilde{O}(m^{-\frac{1}{2(d^*+3)}} + n^{-\frac{1}{2(d+2)}})$ ("partial transfer"), and even to $\tilde{O}(m^{-1/2}+n^{-\frac{1}{2(d+2)}})$ ("complete transfer"), where $d^* \ll d$ and $d$ is the dimension of the observed data. Our theoretical findings are substantiated by empirical experiments conducted on image classification datasets, along with a regression dataset.
Related papers
- A Statistical Analysis of Deep Federated Learning for Intrinsically Low-dimensional Data [32.98264375121064]
Federated Learning (FL) has emerged as a groundbreaking paradigm in collaborative machine learning.
This paper investigates the generalization properties of deep federated regression within a two-stage sampling model.
arXiv Detail & Related papers (2024-10-28T01:36:25Z) - Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning.
We propose a new learning paradigm that integrates both paired and unpaired data.
Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z) - Scaling Distributed Multi-task Reinforcement Learning with Experience
Sharing [38.883540444516605]
DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents.
We conduct both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks.
We propose an algorithm called DistMT-LSVI, where each agent independently learns $epsilon$-optimal policies for all $M$ tasks.
arXiv Detail & Related papers (2023-07-11T22:58:53Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Blessing of Class Diversity in Pre-training [54.335530406959435]
We prove that when the classes of the pre-training task are sufficiently diverse, pre-training can significantly improve the sample efficiency of downstream tasks.
Our proof relies on a vector-form Rademacher complexity chain rule for composite function classes and a modified self-concordance condition.
arXiv Detail & Related papers (2022-09-07T20:10:12Z) - Distributed Sparse Feature Selection in Communication-Restricted
Networks [6.9257380648471765]
We propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection.
In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network.
arXiv Detail & Related papers (2021-11-02T05:02:24Z) - An Improved Transfer Model: Randomized Transferable Machine [32.50263074872975]
We propose a new transfer model called Randomized Transferable Machine (RTM) to handle small divergence of domains.
Specifically, we work on the new source and target data learned from existing feature-based transfer methods.
In principle, the more corruptions are made, the higher the probability of the new target data can be covered by the constructed source data populations.
arXiv Detail & Related papers (2020-11-27T09:37:01Z) - On the Theory of Transfer Learning: The Importance of Task Diversity [114.656572506859]
We consider $t+1$ tasks parameterized by functions of the form $f_j circ h$ in a general function class $mathcalF circ mathcalH$.
We show that for diverse training tasks the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(mathcalH) + t C(mathcalF)$.
arXiv Detail & Related papers (2020-06-20T20:33:59Z) - Automatic Cross-Domain Transfer Learning for Linear Regression [0.0]
This paper helps to extend the capability of transfer learning for linear regression problems.
For normal datasets, we assume that some latent domain information is available for transfer learning.
arXiv Detail & Related papers (2020-05-08T15:05:37Z) - Few-Shot Learning via Learning the Representation, Provably [115.7367053639605]
This paper studies few-shot learning via representation learning.
One uses $T$ source tasks with $n_1$ data per task to learn a representation in order to reduce the sample complexity of a target task.
arXiv Detail & Related papers (2020-02-21T17:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.