Deep Multi-Task Learning Has Low Amortized Intrinsic Dimensionality
- URL: http://arxiv.org/abs/2501.19067v1
- Date: Fri, 31 Jan 2025 11:53:16 GMT
- Title: Deep Multi-Task Learning Has Low Amortized Intrinsic Dimensionality
- Authors: Hossein Zakerinia, Dorsa Ghobadi, Christoph H. Lampert,
- Abstract summary: We introduce a method to parametrize multi-task network directly in the low-dimensional space.
We show that high-accuracy multi-task solutions can be found with much smaller intrinsic dimensionality than what single-task learning requires.
- Score: 15.621144215664769
- License:
- Abstract: Deep learning methods are known to generalize well from training to future data, even in an overparametrized regime, where they could easily overfit. One explanation for this phenomenon is that even when their *ambient dimensionality*, (i.e. the number of parameters) is large, the models' *intrinsic dimensionality* is small, i.e. their learning takes place in a small subspace of all possible weight configurations. In this work, we confirm this phenomenon in the setting of *deep multi-task learning*. We introduce a method to parametrize multi-task network directly in the low-dimensional space, facilitated by the use of *random expansions* techniques. We then show that high-accuracy multi-task solutions can be found with much smaller intrinsic dimensionality (fewer free parameters) than what single-task learning requires. Subsequently, we show that the low-dimensional representations in combination with *weight compression* and *PAC-Bayesian* reasoning lead to the first *non-vacuous generalization bounds* for deep multi-task networks.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks.
Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z) - The Law of Parsimony in Gradient Descent for Learning Deep Linear
Networks [34.85235641812005]
We reveal a surprising "law of parsimony" in the learning dynamics when the data possesses low-dimensional structures.
This simplicity in learning dynamics could have significant implications for both efficient training and a better understanding of deep networks.
arXiv Detail & Related papers (2023-06-01T21:24:53Z) - Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific
Subspaces of Pre-trained Language Models [16.28794184086409]
Pre-trained language models (PLMs) are known to be overly parameterized and have significant redundancy.
We study the problem of re- parameterizing and fine-tuning PLMs from a new perspective: Discovery of intrinsic task-specific subspace.
A key finding is that PLMs can be effectively fine-tuned in the subspace with a small number of free parameters.
arXiv Detail & Related papers (2023-05-27T11:16:26Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Infinite wide (finite depth) Neural Networks benefit from multi-task
learning unlike shallow Gaussian Processes -- an exact quantitative
macroscopic characterization [0.0]
We prove that ReLU neural networks (NNs) with at least one hidden layer optimized with l2-regularization on the parameters enforces multi-task learning due to representation-learning.
This is in contrast to multiple other idealized settings discussed in the literature where wide (ReLU)-NNs loose their ability to benefit from multi-task learning in the limit width to infinity.
arXiv Detail & Related papers (2021-12-31T18:03:46Z) - New Tight Relaxations of Rank Minimization for Multi-Task Learning [161.23314844751556]
We propose two novel multi-task learning formulations based on two regularization terms.
We show that our methods can correctly recover the low-rank structure shared across tasks, and outperform related multi-task learning methods.
arXiv Detail & Related papers (2021-12-09T07:29:57Z) - Exploring the Common Principal Subspace of Deep Features in Neural
Networks [50.37178960258464]
We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces.
Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN.
Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
arXiv Detail & Related papers (2021-10-06T15:48:32Z) - Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature
Learning and Lazy Training [4.318555434063275]
Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing.
Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible.
We argue that different learning regimes can be organized into a phase diagram.
arXiv Detail & Related papers (2020-12-30T11:00:36Z) - ATOM3D: Tasks On Molecules in Three Dimensions [91.72138447636769]
Deep neural networks have recently gained significant attention.
In this work we present ATOM3D, a collection of both novel and existing datasets spanning several key classes of biomolecules.
We develop three-dimensional molecular learning networks for each of these tasks, finding that they consistently improve performance.
arXiv Detail & Related papers (2020-12-07T20:18:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.