Related papers: Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores

Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores

URL: http://arxiv.org/abs/2210.08650v1
Date: Sun, 16 Oct 2022 22:28:36 GMT
Title: Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores
Authors: Arsany Guirguis, Diana Petrescu, Florin Dinu, Do Le Quoc, Javier Picorel, Rachid Guerraoui
Abstract summary: This paper identifies transfer learning (TL) as a natural fit for the disaggregated cloud. We show how to leverage the unique structure of TL's fine-tuning phase to flexibly address the aforementioned constraints. We present HAPI, a processing system for TL that spans the compute and storage tiers while remaining transparent to the user.
Score: 5.057544107331778
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Storage disaggregation is fundamental to today's cloud due to cost and scalability benefits. Unfortunately, this design must cope with an inherent network bottleneck between the storage and the compute tiers. The widely deployed mitigation strategy is to provide computational resources next to storage to push down a part of an application and thus reduce the amount of data transferred to the compute tier. Overall, users of disaggregated storage need to consider two main constraints: the network may remain a bottleneck, and the storage-side computational resources are limited. This paper identifies transfer learning (TL) as a natural fit for the disaggregated cloud. TL, famously described as the next driver of ML commercial success, is widely popular and has broad-range applications. We show how to leverage the unique structure of TL's fine-tuning phase (i.e., a combination of feature extraction and training) to flexibly address the aforementioned constraints and improve both user and operator-centric metrics. The key to improving user-perceived performance is to mitigate the network bottleneck by carefully splitting the TL deep neural network (DNN) such that feature extraction is, partially or entirely, executed next to storage. Crucially, such splitting enables decoupling the batch size of feature extraction from the training batch size, facilitating efficient storage-side batch size adaptation to increase concurrency in the storage tier while avoiding out-of-memory errors. Guided by these insights, we present HAPI, a processing system for TL that spans the compute and storage tiers while remaining transparent to the user. Our evaluation with several DNNs, such as ResNet, VGG, and Transformer, shows up to 11x improvement in application runtime and up to 8.3x reduction in the data transferred from the storage to the compute tier compared to running the computation in the compute tier.

Related papers

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates [50.406127962933915]
We develop solutions to problems which enable us to learn a communication-efficient distributed logistic regression model. In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed.
arXiv Detail & Related papers (2024-07-08T19:34:39Z)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models. HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z)
LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory Reduction [21.388549904063538]
Convolutional Neural Network with a multi-layer architecture has advanced rapidly. Current efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty. We break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers. This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy.
arXiv Detail & Related papers (2024-01-21T12:19:13Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings [113.38884267189871]
Training and inference on edge devices often requires an efficient setup due to computational limitations. Pre-computing data representations and caching them on a server can mitigate extensive edge device computation. We propose a simple, yet effective approach that uses randomly hyperplane projections. We show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.
arXiv Detail & Related papers (2023-03-13T10:53:00Z)
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads. We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z)
NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems. This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS) We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z)
Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel. We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z)
Efficient Data-Plane Memory Scheduling for In-Network Aggregation [14.52822604368543]
We propose ESA, an $underlineE$fficient Switch Memory $underlineS$cheduler for In-Network $underlineA$ggregation. At its cores, ESA enforces the aggregator allocation primitive and introduces priority scheduling at the data-plane. Experiments show that ESA can improve the average JCT by up to $1.35times$.
arXiv Detail & Related papers (2022-01-17T13:29:18Z)
MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping. We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations. Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.