Related papers: P$^2$U: Progressive Precision Update For Efficient Model Distribution

P$^2$U: Progressive Precision Update For Efficient Model Distribution

URL: http://arxiv.org/abs/2506.22871v1
Date: Sat, 28 Jun 2025 12:47:04 GMT
Title: P$^2$U: Progressive Precision Update For Efficient Model Distribution
Authors: Homayun Afrabandpey, Hamed Rezazadegan Tavakoli,
Abstract summary: We propose Progressive Precision Update (P$2$U) to address this problem.<n>Instead of transmitting the original high-precision model, P$2$U transmits a lower-bit precision model.<n>P$2$U consistently achieves better tradeoff between accuracy, bandwidth usage and latency.
Score: 2.3349787245442966
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Efficient model distribution is becoming increasingly critical in bandwidth-constrained environments. In this paper, we propose a simple yet effective approach called Progressive Precision Update (P$^2$U) to address this problem. Instead of transmitting the original high-precision model, P$^2$U transmits a lower-bit precision model, coupled with a model update representing the difference between the original high-precision model and the transmitted low precision version. With extensive experiments on various model architectures, ranging from small models ($1 - 6$ million parameters) to a large model (more than $100$ million parameters) and using three different data sets, e.g., chest X-Ray, PASCAL-VOC, and CIFAR-100, we demonstrate that P$^2$U consistently achieves better tradeoff between accuracy, bandwidth usage and latency. Moreover, we show that when bandwidth or startup time is the priority, aggressive quantization (e.g., 4-bit) can be used without severely compromising performance. These results establish P$^2$U as an effective and practical solution for scalable and efficient model distribution in low-resource settings, including federated learning, edge computing, and IoT deployments. Given that P$^2$U complements existing compression techniques and can be implemented alongside any compression method, e.g., sparsification, quantization, pruning, etc., the potential for improvement is even greater.

Related papers

Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
u-$μ$P: The Unit-Scaled Maximal Update Parametrization [4.275373946090221]
We present a new scheme, u-$mu$P, which improves upon $mu$P by combining it with Unit Scaling.<n>The two techniques have a natural affinity: $mu$P ensures that the scale of activations is independent of model size, and Unit Scaling ensures that activations, weights and gradients begin training with a scale of one.
arXiv Detail & Related papers (2024-07-24T17:58:42Z)
Diffusion Model Patching via Mixture-of-Prompts [17.04227271007777]
Diffusion Model Patching (DMP) is a simple method to boost the performance of pre-trained diffusion models.<n>DMP inserts a small, learnable set of prompts into the model's input space while keeping the original model frozen.<n>DMP significantly enhances the FID of converged DiT-L/2 by 10.38% on FFHQ.
arXiv Detail & Related papers (2024-05-28T04:47:54Z)
Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z)
On Optimal Caching and Model Multiplexing for Large Model Inference [66.50550915522551]
Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. We study two approaches for mitigating these challenges: employing a cache to store previous queries and learning a model multiplexer to choose from an ensemble of models for query processing.
arXiv Detail & Related papers (2023-06-03T05:01:51Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning [51.151805100550625]
This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models. Compared with full fine-tuning, DiffFit achieves 2$times$ training speed-up and only needs to store approximately 0.12% of the total model parameters. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost.
arXiv Detail & Related papers (2023-04-13T16:17:50Z)
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness. We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z)
Overfitting for Fun and Profit: Instance-Adaptive Data Compression [20.764189960709164]
Neural data compression has been shown to outperform classical methods in terms of $RD$ performance. In this paper we take this concept to the extreme, adapting the full model to a single video, and sending model updates along with the latent representation. We demonstrate that full-model adaptation improves $RD$ performance by 1 dB, with respect to encoder-only finetuning.
arXiv Detail & Related papers (2021-01-21T15:58:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.