Related papers: Matching DNN Compression and Cooperative Training with Resources and Data Availability

Matching DNN Compression and Cooperative Training with Resources and Data Availability

URL: http://arxiv.org/abs/2212.02304v1
Date: Fri, 2 Dec 2022 09:52:18 GMT
Title: Matching DNN Compression and Cooperative Training with Resources and Data Availability
Authors: Francesco Malandrino and Giuseppe Di Giacomo and Armin Karamzade and Marco Levorato and Carla Fabiana Chiasserini
Abstract summary: How much and when an ML model should be compressed, and em where its training should be executed, are hard decisions to make. We model the network system focusing on the training of DNNs, formalize the multi-dimensional problem, and formulate an approximate dynamic programming problem. We prove that PACT's solutions can get as close to the optimum as desired, at the cost of an increased time complexity.
Score: 20.329698347331075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To make machine learning (ML) sustainable and apt to run on the diverse devices where relevant data is, it is essential to compress ML models as needed, while still meeting the required learning quality and time performance. However, how much and when an ML model should be compressed, and {\em where} its training should be executed, are hard decisions to make, as they depend on the model itself, the resources of the available nodes, and the data such nodes own. Existing studies focus on each of those aspects individually, however, they do not account for how such decisions can be made jointly and adapted to one another. In this work, we model the network system focusing on the training of DNNs, formalize the above multi-dimensional problem, and, given its NP-hardness, formulate an approximate dynamic programming problem that we solve through the PACT algorithmic framework. Importantly, PACT leverages a time-expanded graph representing the learning process, and a data-driven and theoretical approach for the prediction of the loss evolution to be expected as a consequence of training decisions. We prove that PACT's solutions can get as close to the optimum as desired, at the cost of an increased time complexity, and that, in any case, such complexity is polynomial. Numerical results also show that, even under the most disadvantageous settings, PACT outperforms state-of-the-art alternatives and closely matches the optimal energy cost.

Related papers

Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning [9.507070656654632]
Large Language Models (LLMs) have demonstrated impressive performance across various tasks. Current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution.
arXiv Detail & Related papers (2024-09-20T16:46:17Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues. We propose a novel approach to address this issue at test time without requiring retraining. MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z)
Dependable Distributed Training of Compressed Machine Learning Models [16.403297089086042]
We propose DepL, a framework for dependable learning orchestration. It makes high-quality, efficient decisions on (i) the data to leverage for learning, (ii) the models to use and when to switch among them, and (iii) the clusters of nodes, and the resources thereof, to exploit. We prove that DepL has constant competitive ratio and complexity, and show that it outperforms the state-of-the-art by over 27%.
arXiv Detail & Related papers (2024-02-22T07:24:26Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud Continuum [18.247181241860538]
We address distributed machine learning in multi-tier networks where a heterogeneous set of nodes cooperate to perform a learning task. We propose a solution concept, called RightTrain, that achieves energy-efficient ML model training, while fulfilling learning time and quality requirements. Our performance evaluation shows that RightTrain closely matches the optimum and outperforms the state of the art by over 50%.
arXiv Detail & Related papers (2022-02-23T08:35:41Z)
Deep Learning with Multiple Data Set: A Weighted Goal Programming Approach [2.7393821783237184]
Large-scale data analysis is growing at an exponential rate as data proliferates in our societies. Deep Learning models require plenty of resources, and distributed training is needed. This paper presents a Multicriteria approach for distributed learning.
arXiv Detail & Related papers (2021-11-27T07:10:25Z)
Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices. We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time. Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.