BOLT: An Automated Deep Learning Framework for Training and Deploying
Large-Scale Search and Recommendation Models on Commodity CPU Hardware
- URL: http://arxiv.org/abs/2303.17727v4
- Date: Tue, 12 Sep 2023 14:17:53 GMT
- Title: BOLT: An Automated Deep Learning Framework for Training and Deploying
Large-Scale Search and Recommendation Models on Commodity CPU Hardware
- Authors: Nicholas Meisburger, Vihan Lakshman, Benito Geordie, Joshua Engels,
David Torres Ramos, Pratik Pranav, Benjamin Coleman, Benjamin Meisburger,
Shubh Gupta, Yashwanth Adunukota, Tharun Medini, Anshumali Shrivastava
- Abstract summary: BOLT is a sparse deep learning library for training large-scale search and recommendation models on standard CPU hardware.
We evaluate BOLT on a number of information retrieval tasks including product recommendations, text classification, graph neural networks, and personalization.
- Score: 28.05159031634185
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Efficient large-scale neural network training and inference on commodity CPU
hardware is of immense practical significance in democratizing deep learning
(DL) capabilities. Presently, the process of training massive models consisting
of hundreds of millions to billions of parameters requires the extensive use of
specialized hardware accelerators, such as GPUs, which are only accessible to a
limited number of institutions with considerable financial resources. Moreover,
there is often an alarming carbon footprint associated with training and
deploying these models. In this paper, we take a step towards addressing these
challenges by introducing BOLT, a sparse deep learning library for training
large-scale search and recommendation models on standard CPU hardware. BOLT
provides a flexible, high-level API for constructing models that will be
familiar to users of existing popular DL frameworks. By automatically tuning
specialized hyperparameters, BOLT also abstracts away the algorithmic details
of sparse network training. We evaluate BOLT on a number of information
retrieval tasks including product recommendations, text classification, graph
neural networks, and personalization. We find that our proposed system achieves
competitive performance with state-of-the-art techniques at a fraction of the
cost and energy consumption and an order-of-magnitude faster inference time.
BOLT has also been successfully deployed by multiple businesses to address
critical problems, and we highlight one customer case study in the field of
e-commerce.
Related papers
- On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - Unifying Synergies between Self-supervised Learning and Dynamic
Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms.
We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting.
The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z) - On-device Training: A First Overview on Existing Systems [8.0653715405809]
Efforts have been made to deploy some models on resource-constrained devices as well.
This work targets to summarize and analyze state-of-the-art systems research that allows such on-device model training capabilities.
arXiv Detail & Related papers (2022-12-01T19:22:29Z) - FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures.
Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - M6-10T: A Sharing-Delinking Paradigm for Efficient Multi-Trillion
Parameter Pretraining [55.16088793437898]
Training extreme-scale models requires enormous amounts of computes and memory footprint.
We propose a simple training strategy called "Pseudo-to-Real" for high-memory-footprint-required large models.
arXiv Detail & Related papers (2021-10-08T04:24:51Z) - Distributed Training of Deep Learning Models: A Taxonomic Perspective [11.924058430461216]
Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster.
We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines.
arXiv Detail & Related papers (2020-07-08T08:56:58Z) - Knowledge Distillation: A Survey [87.51063304509067]
Deep neural networks have been successful in both industry and academia, especially for computer vision tasks.
It is a challenge to deploy these cumbersome deep models on devices with limited resources.
Knowledge distillation effectively learns a small student model from a large teacher model.
arXiv Detail & Related papers (2020-06-09T21:47:17Z) - Resource-Efficient Neural Networks for Embedded Systems [23.532396005466627]
We provide an overview of the current state of the art of machine learning techniques.
We focus on resource-efficient inference based on deep neural networks (DNNs), the predominant machine learning models of the past decade.
We substantiate our discussion with experiments on well-known benchmark data sets using compression techniques.
arXiv Detail & Related papers (2020-01-07T14:17:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.