Accelerating Transfer Learning with Near-Data Computation on Cloud
Object Stores
- URL: http://arxiv.org/abs/2210.08650v1
- Date: Sun, 16 Oct 2022 22:28:36 GMT
- Title: Accelerating Transfer Learning with Near-Data Computation on Cloud
Object Stores
- Authors: Arsany Guirguis, Diana Petrescu, Florin Dinu, Do Le Quoc, Javier
Picorel, Rachid Guerraoui
- Abstract summary: This paper identifies transfer learning (TL) as a natural fit for the disaggregated cloud.
We show how to leverage the unique structure of TL's fine-tuning phase to flexibly address the aforementioned constraints.
We present HAPI, a processing system for TL that spans the compute and storage tiers while remaining transparent to the user.
- Score: 5.057544107331778
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Storage disaggregation is fundamental to today's cloud due to cost and
scalability benefits. Unfortunately, this design must cope with an inherent
network bottleneck between the storage and the compute tiers. The widely
deployed mitigation strategy is to provide computational resources next to
storage to push down a part of an application and thus reduce the amount of
data transferred to the compute tier. Overall, users of disaggregated storage
need to consider two main constraints: the network may remain a bottleneck, and
the storage-side computational resources are limited. This paper identifies
transfer learning (TL) as a natural fit for the disaggregated cloud. TL,
famously described as the next driver of ML commercial success, is widely
popular and has broad-range applications. We show how to leverage the unique
structure of TL's fine-tuning phase (i.e., a combination of feature extraction
and training) to flexibly address the aforementioned constraints and improve
both user and operator-centric metrics. The key to improving user-perceived
performance is to mitigate the network bottleneck by carefully splitting the TL
deep neural network (DNN) such that feature extraction is, partially or
entirely, executed next to storage. Crucially, such splitting enables
decoupling the batch size of feature extraction from the training batch size,
facilitating efficient storage-side batch size adaptation to increase
concurrency in the storage tier while avoiding out-of-memory errors. Guided by
these insights, we present HAPI, a processing system for TL that spans the
compute and storage tiers while remaining transparent to the user. Our
evaluation with several DNNs, such as ResNet, VGG, and Transformer, shows up to
11x improvement in application runtime and up to 8.3x reduction in the data
transferred from the storage to the compute tier compared to running the
computation in the compute tier.
Related papers
- High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates [50.406127962933915]
We develop solutions to problems which enable us to learn a communication-efficient distributed logistic regression model.
In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed.
arXiv Detail & Related papers (2024-07-08T19:34:39Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - LR-CNN: Lightweight Row-centric Convolutional Neural Network Training
for Memory Reduction [21.388549904063538]
Convolutional Neural Network with a multi-layer architecture has advanced rapidly.
Current efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty.
We break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers.
This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy.
arXiv Detail & Related papers (2024-01-21T12:19:13Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Rediscovering Hashed Random Projections for Efficient Quantization of
Contextualized Sentence Embeddings [113.38884267189871]
Training and inference on edge devices often requires an efficient setup due to computational limitations.
Pre-computing data representations and caching them on a server can mitigate extensive edge device computation.
We propose a simple, yet effective approach that uses randomly hyperplane projections.
We show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point.
arXiv Detail & Related papers (2023-03-13T10:53:00Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - Efficient Data-Plane Memory Scheduling for In-Network Aggregation [14.52822604368543]
We propose ESA, an $underlineE$fficient Switch Memory $underlineS$cheduler for In-Network $underlineA$ggregation.
At its cores, ESA enforces the aggregator allocation primitive and introduces priority scheduling at the data-plane.
Experiments show that ESA can improve the average JCT by up to $1.35times$.
arXiv Detail & Related papers (2022-01-17T13:29:18Z) - MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated
Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping.
We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations.
Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.