Related papers: Distributed Double Machine Learning with a Serverless Architecture

Related papers

NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units. It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
A Multi-task Supervised Compression Model for Split Computing [4.234757989234096]
Split computing is a promising approach to deep learning models for resource-constrained edge computing systems. We propose Ladon, the first multi-task-head supervised compression model for multi-task split computing. Our models reduced end-to-end latency (by up to 95.4%) and energy consumption of mobile devices (by up to 88.2%) in multi-task split computing scenarios.
arXiv Detail & Related papers (2025-01-02T18:59:05Z)
Hyperdimensional Vector Tsetlin Machines with Applications to Sequence Learning and Generation [0.0]
We construct a two-layered model for learning and generating sequential data that is both computationally fast and competitive with vanilla Tsetlin machines. We apply the approach in two areas, namely in forecasting, generating new sequences, and classification. For the latter, we derive results for the entire UCR Time Series Archive and compare with the standard benchmarks to see how well the method competes in time series classification.
arXiv Detail & Related papers (2024-08-29T15:28:01Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Architecting Peer-to-Peer Serverless Distributed Machine Learning Training for Improved Fault Tolerance [1.495380389108477]
Serverless computing is a new paradigm for cloud computing that uses functions as a computational unit. By distributing the workload, distributed machine learning can speed up the training process and allow more complex models to be trained. We propose exploring the use of serverless computing in distributed machine learning training and comparing the performance of P2P architecture with the parameter server architecture.
arXiv Detail & Related papers (2023-02-27T17:38:47Z)
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving [53.01646445659089]
We show that model parallelism can be used for the statistical multiplexing of multiple devices when serving multiple models. We present a novel serving system, AlpaServe, that determines an efficient strategy for placing and parallelizing collections of large deep learning models.
arXiv Detail & Related papers (2023-02-22T21:41:34Z)
Dual Learning for Large Vocabulary On-Device ASR [64.10124092250128]
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
arXiv Detail & Related papers (2023-01-11T06:32:28Z)
PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time. PARTIME starts processing each data sample at the time in which it becomes available from the stream. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z)
Recipe for Fast Large-scale SVM Training: Polishing, Parallelism, and more RAM! [0.0]
Support vector machines (SVMs) are a standard method in the machine learning toolbox. Non-linear kernel SVMs often deliver highly accurate predictors, however, at the cost of long training times. In this work, we combine both approaches to design an extremely fast dual SVM solver.
arXiv Detail & Related papers (2022-07-03T11:51:41Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
ESAI: Efficient Split Artificial Intelligence via Early Exiting Using Neural Architecture Search [6.316693022958222]
Deep neural networks have been outperforming conventional machine learning algorithms in many computer vision-related tasks. The majority of devices are harnessing the cloud computing methodology in which outstanding deep learning models are responsible for analyzing the data on the server. In this paper, a new framework for deploying on IoT devices has been proposed which can take advantage of both the cloud and the on-device models.
arXiv Detail & Related papers (2021-06-21T04:47:53Z)
Reservoir Stack Machines [77.12475691708838]
Memory-augmented neural networks equip a recurrent neural network with an explicit memory to support tasks that require information storage. We introduce the reservoir stack machine, a model which can provably recognize all deterministic context-free languages. Our results show that the reservoir stack machine achieves zero error, even on test sequences longer than the training data.
arXiv Detail & Related papers (2021-05-04T16:50:40Z)
DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R [4.830430752756141]
R package DoubleML implements the double/debiased machine learning framework. It provides functionalities to estimate parameters in causal models based on machine learning methods.
arXiv Detail & Related papers (2021-03-17T12:42:41Z)
Network Support for High-performance Distributed Machine Learning [17.919773898228716]
We propose a system model that captures both learning nodes (that perform computations) and information nodes (that provide data) We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of iterations to perform. We devise an algorithm, named DoubleClimb, that can find a 1+1/|I|-competitive solution with cubic worst-case complexity.
arXiv Detail & Related papers (2021-02-05T19:38:57Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.