ATE-SG: Alternate Through the Epochs Stochastic Gradient for Multi-Task Neural Networks
- URL: http://arxiv.org/abs/2312.16340v2
- Date: Sun, 18 May 2025 09:24:23 GMT
- Title: ATE-SG: Alternate Through the Epochs Stochastic Gradient for Multi-Task Neural Networks
- Authors: Stefania Bellavia, Francesco Della Santa, Alessandra Papini,
- Abstract summary: This paper introduces novel alternate training procedures for hard- parameter sharing Multi-Task Neural Networks (MTNNs)<n>The proposed alternate training method updates shared and task-specific weights alternately through the epochs, exploiting the multi-head architecture of the model.<n> Empirical experiments demonstrate enhanced training regularization and reduced computational demands.
- Score: 44.99833362998488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces novel alternate training procedures for hard-parameter sharing Multi-Task Neural Networks (MTNNs). Traditional MTNN training faces challenges in managing conflicting loss gradients, often yielding sub-optimal performance. The proposed alternate training method updates shared and task-specific weights alternately through the epochs, exploiting the multi-head architecture of the model. This approach reduces computational costs per epoch and memory requirements. Convergence properties similar to those of the classical stochastic gradient method are established. Empirical experiments demonstrate enhanced training regularization and reduced computational demands. In summary, our alternate training procedures offer a promising advancement for the training of hard-parameter sharing MTNNs.
Related papers
- Neural Network Training via Stochastic Alternating Minimization with Trainable Step Sizes [3.246129789918632]
The training of deep neural networks is inherently a non- optimization problem.<n>Standard approaches such as gradient descent (SGD) require simultaneous updates to parameters.<n>We propose a novel method Alternating Train Miniization with tailored step sizes (SAMT)<n>SAMT achieves better performance with fewer parameter updates compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-08-06T08:23:38Z) - Decoupled Relative Learning Rate Schedules [4.34286535607654]
We introduce a novel approach for optimizing LLM training by adjusting learning rates across weights of different components in Transformer models.<n>Our introduced relative learning rates, RLRS, method accelerates the training process by up to $23%$.
arXiv Detail & Related papers (2025-07-04T12:23:45Z) - ADMM-Based Training for Spiking Neural Networks [1.1249583407496218]
spiking neural networks (SNNs) have gained momentum due to their high potential in time-series processing combined with minimal energy consumption.<n>They still lack a dedicated and efficient training algorithm.<n>We propose a novel SNN training method based on the alternating direction method of multipliers (ADMM)
arXiv Detail & Related papers (2025-05-08T10:20:33Z) - Self-Controlled Dynamic Expansion Model for Continual Learning [10.447232167638816]
This paper introduces an innovative Self-Controlled Dynamic Expansion Model (SCDEM)
SCDEM orchestrates multiple trainable pre-trained ViT backbones to furnish diverse and semantically enriched representations.
An extensive series of experiments have been conducted to evaluate the proposed methodology's efficacy.
arXiv Detail & Related papers (2025-04-14T15:22:51Z) - An Augmented Backward-Corrected Projector Splitting Integrator for Dynamical Low-Rank Training [47.69709732622765]
We introduce a novel low-rank training method that reduces the number of required QR decompositions.
Our approach integrates an augmentation step into a projector-splitting scheme, ensuring convergence to a locally optimal solution.
arXiv Detail & Related papers (2025-02-05T09:03:50Z) - Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization [7.776434991976473]
Multi-Task Learning (MTL) involves the concurrent training of multiple tasks.
We propose an advanced MTL model specifically designed for dense vision tasks.
arXiv Detail & Related papers (2024-12-04T10:05:47Z) - Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective [33.477681689943516]
A common issue in multi-task learning is the occurrence of gradient conflict.
We propose a strategy to reduce such conflicts through sparse training (ST)
Our experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance.
arXiv Detail & Related papers (2024-11-27T18:58:22Z) - LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging [80.17238673443127]
LiNeS is a post-training editing technique designed to preserve pre-trained generalization while enhancing fine-tuned task performance.
LiNeS demonstrates significant improvements in both single-task and multi-task settings across various benchmarks in vision and natural language processing.
arXiv Detail & Related papers (2024-10-22T16:26:05Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training.
We then fit a hidden Markov model (HMM) over the resulting sequences of metrics.
We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z) - Multimodal Parameter-Efficient Few-Shot Class Incremental Learning [1.9220716793379256]
Few-Shot Class Incremental Learning (FSCIL) is a challenging continual learning task, where limited training examples are available during several learning sessions.
To succeed in this task, it is necessary to avoid over-fitting new classes caused by biased distributions in the few-shot training sets.
CPE-CLIP significantly improves FSCIL performance compared to state-of-the-art proposals while also drastically reducing the number of learnable parameters and training costs.
arXiv Detail & Related papers (2023-03-08T17:34:15Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong
Reinforcement Learning [11.076005074172516]
reinforcement learning algorithms can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information.
We propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge.
We show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.
arXiv Detail & Related papers (2022-05-22T09:48:41Z) - Multi-Task Learning as a Bargaining Game [63.49888996291245]
In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks.
Since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts.
We propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update.
arXiv Detail & Related papers (2022-02-02T13:21:53Z) - An Optimization-Based Meta-Learning Model for MRI Reconstruction with
Diverse Dataset [4.9259403018534496]
We develop a generalizable MRI reconstruction model in the meta-learning framework.
The proposed network learns regularization function in a learner adaptional model.
We test the result of quick training on the unseen tasks after meta-training and in the saving half of the time.
arXiv Detail & Related papers (2021-10-02T03:21:52Z) - Improving the Accuracy of Early Exits in Multi-Exit Architectures via
Curriculum Learning [88.17413955380262]
Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy.
We introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning.
Our method consistently improves the accuracy of early exits compared to the standard training approach.
arXiv Detail & Related papers (2021-04-21T11:12:35Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS)
Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.