Successive Refinement in Large-Scale Computation: Advancing Model
Inference Applications
- URL: http://arxiv.org/abs/2402.07229v1
- Date: Sun, 11 Feb 2024 15:36:33 GMT
- Title: Successive Refinement in Large-Scale Computation: Advancing Model
Inference Applications
- Authors: Homa Esfahanizadeh, Alejandro Cohen, Shlomo Shamai (Shitz), Muriel
Medard
- Abstract summary: We introduce solutions for layered-resolution computation.
These solutions allow lower-resolution results to be obtained at an earlier stage than the final result.
- Score: 67.76749044675721
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern computationally-intensive applications often operate under time
constraints, necessitating acceleration methods and distribution of
computational workloads across multiple entities. However, the outcome is
either achieved within the desired timeline or not, and in the latter case,
valuable resources are wasted. In this paper, we introduce solutions for
layered-resolution computation. These solutions allow lower-resolution results
to be obtained at an earlier stage than the final result. This innovation
notably enhances the deadline-based systems, as if a computational job is
terminated due to time constraints, an approximate version of the final result
can still be generated. Moreover, in certain operational regimes, a
high-resolution result might be unnecessary, because the low-resolution result
may already deviate significantly from the decision threshold, for example in
AI-based decision-making systems. Therefore, operators can decide whether
higher resolution is needed or not based on intermediate results, enabling
computations with adaptive resolution. We present our framework for two
critical and computationally demanding jobs: distributed matrix multiplication
(linear) and model inference in machine learning (nonlinear). Our theoretical
and empirical results demonstrate that the execution delay for the first
resolution is significantly shorter than that for the final resolution, while
maintaining overall complexity comparable to the conventional one-shot
approach. Our experiments further illustrate how the layering feature increases
the likelihood of meeting deadlines and enables adaptability and transparency
in massive, large-scale computations.
Related papers
- Solving Sparse \& High-Dimensional-Output Regression via Compression [2.7596444457918263]
Multi-Output Regression (MOR) has been widely used in scientific data analysis for decision-making.
The increasing dimensionality of the outputs poses significant challenges regarding interpretability and computational scalability for modern MOR applications.
This paper proposes a Sparse & High-dimensional-Output REgression model by incorporating additional sparsity requirements to resolve the output interpretability.
We show that the proposed framework is computationally scalable while maintaining the same order of training loss and prediction loss before-and-after compression under arbitrary or relatively weak sample set conditions.
arXiv Detail & Related papers (2024-10-21T08:21:25Z) - Predicting Probabilities of Error to Combine Quantization and Early Exiting: QuEE [68.6018458996143]
We propose a more general dynamic network that can combine both quantization and early exit dynamic network: QuEE.
Our algorithm can be seen as a form of soft early exiting or input-dependent compression.
The crucial factor of our approach is accurate prediction of the potential accuracy improvement achievable through further computation.
arXiv Detail & Related papers (2024-06-20T15:25:13Z) - An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations [3.921076451326108]
The Super Resolution Operator Network (SRNet) frames super-resolution as an operator learning problem.
It draws inspiration from existing operator learning problems to learn continuous representations of parametric differential equations from low-resolution approximations.
No restrictions are imposed on the locations of sensors at which the low-resolution approximations are provided.
arXiv Detail & Related papers (2023-11-04T05:33:23Z) - Taking the human out of decomposition-based optimization via artificial
intelligence: Part II. Learning to initialize [0.0]
The proposed approach can lead to a significant reduction in solution time.
Active and supervised learning is used to learn a surrogate model that predicts the computational performance.
The results show that the proposed approach can lead to a significant reduction in solution time.
arXiv Detail & Related papers (2023-10-10T23:49:26Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR)
CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation.
We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z) - Polynomial-Time Exact MAP Inference on Discrete Models with Global
Dependencies [83.05591911173332]
junction tree algorithm is the most general solution for exact MAP inference with run-time guarantees.
We propose a new graph transformation technique via node cloning which ensures a run-time for solving our target problem independently of the form of a corresponding clique tree.
arXiv Detail & Related papers (2019-12-27T13:30:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.