Improving Inference Performance of Machine Learning with the
Divide-and-Conquer Principle
- URL: http://arxiv.org/abs/2301.05099v1
- Date: Thu, 12 Jan 2023 15:55:12 GMT
- Title: Improving Inference Performance of Machine Learning with the
Divide-and-Conquer Principle
- Authors: Alex Kogan
- Abstract summary: Many popular machine learning models scale poorly when deployed on CPUs.
We propose a simple, yet effective approach based on the Divide-and-Conquer Principle to tackle this problem.
We implement this idea in the popular OnnxRuntime framework and evaluate its effectiveness with several use cases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many popular machine learning models scale poorly when deployed on CPUs. In
this paper we explore the reasons why and propose a simple, yet effective
approach based on the well-known Divide-and-Conquer Principle to tackle this
problem of great practical importance. Given an inference job, instead of using
all available computing resources (i.e., CPU cores) for running it, the idea is
to break the job into independent parts that can be executed in parallel, each
with the number of cores according to its expected computational cost. We
implement this idea in the popular OnnxRuntime framework and evaluate its
effectiveness with several use cases, including the well-known models for
optical character recognition (PaddleOCR) and natural language processing
(BERT).
Related papers
- Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We first propose a library called PCX, whose focus lies on performance and simplicity.
We use PCX to implement a large set of benchmarks for the community to use for their experiments.
arXiv Detail & Related papers (2024-07-01T10:33:44Z) - Scalable Federated Unlearning via Isolated and Coded Sharding [76.12847512410767]
Federated unlearning has emerged as a promising paradigm to erase the client-level data effect.
This paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing.
arXiv Detail & Related papers (2024-01-29T08:41:45Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Efficient Prompting via Dynamic In-Context Learning [76.83516913735072]
We propose DynaICL, a recipe for efficient prompting with black-box generalist models.
DynaICL dynamically allocates in-context examples according to the input complexity and the computational budget.
We find that DynaICL saves up to 46% token budget compared to the common practice that allocates the same number of in-context examples to each input.
arXiv Detail & Related papers (2023-05-18T17:58:31Z) - Matched Machine Learning: A Generalized Framework for Treatment Effect
Inference With Learned Metrics [87.05961347040237]
We introduce Matched Machine Learning, a framework that combines the flexibility of machine learning black boxes with the interpretability of matching.
Our framework uses machine learning to learn an optimal metric for matching units and estimating outcomes.
We show empirically that instances of Matched Machine Learning perform on par with black-box machine learning methods and better than existing matching methods for similar problems.
arXiv Detail & Related papers (2023-04-03T19:32:30Z) - Towards a learning-based performance modeling for accelerating Deep
Neural Networks [1.1549572298362785]
We start an investigation of predictive models based on machine learning techniques in order to optimize Convolution Neural Networks (CNNs)
Preliminary experiments on Midgard-based ARM Mali GPU show that our predictive model outperforms all the convolution operators manually selected by the library.
arXiv Detail & Related papers (2022-12-09T18:28:07Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z) - Efficient Inference via Universal LSH Kernel [35.22983601434134]
We propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations.
Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name.
We show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in complexity without any drop in accuracy.
arXiv Detail & Related papers (2021-06-21T22:06:32Z) - Fast Object Segmentation Learning with Kernel-based Methods for Robotics [21.48920421574167]
Object segmentation is a key component in the visual system of a robot that performs tasks like grasping and object manipulation.
We propose a novel architecture for object segmentation, that overcomes this problem and provides comparable performance in a fraction of the time required by the state-of-the-art methods.
Our approach is validated on the YCB-Video dataset which is widely adopted in the computer vision and robotics community.
arXiv Detail & Related papers (2020-11-25T15:07:39Z) - Optimizing Streaming Parallelism on Heterogeneous Many-Core
Architectures: A Machine Learning Based Approach [16.702537371391053]
This article presents an automatic approach to derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures.
Our approach employs a performance model to estimate the resulting performance of the target application under a given resource partition and task granularity configuration.
Compared to the single-stream version, our approach achieves a 1.6x and 1.1x speedup on the XeonPhi and the GPU platform, respectively.
arXiv Detail & Related papers (2020-03-05T21:18:21Z) - An Advance on Variable Elimination with Applications to Tensor-Based
Computation [11.358487655918676]
We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference.
The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth.
arXiv Detail & Related papers (2020-02-21T14:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.