Related papers: OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression

OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression

URL: http://arxiv.org/abs/2111.01662v1
Date: Tue, 2 Nov 2021 15:18:25 GMT
Title: OSOA: One-Shot Online Adaptation of Deep Generative Models for Lossless Compression
Authors: Chen Zhang, Shifeng Zhang, Fabio Maria Carlucci, Zhenguo Li
Abstract summary: We propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets.
Score: 49.10945855716001
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Explicit deep generative models (DGMs), e.g., VAEs and Normalizing Flows, have shown to offer an effective data modelling alternative for lossless compression. However, DGMs themselves normally require large storage space and thus contaminate the advantage brought by accurate data density estimation. To eliminate the requirement of saving separate models for different target datasets, we propose a novel setting that starts from a pretrained deep generative model and compresses the data batches while adapting the model with a dynamical system for only one epoch. We formalise this setting as that of One-Shot Online Adaptation (OSOA) of DGMs for lossless compression and propose a vanilla algorithm under this setting. Experimental results show that vanilla OSOA can save significant time versus training bespoke models and space versus using one model for all targets. With the same adaptation step number or adaptation time, it is shown vanilla OSOA can exhibit better space efficiency, e.g., $47\%$ less space, than fine-tuning the pretrained model and saving the fine-tuned model. Moreover, we showcase the potential of OSOA and motivate more sophisticated OSOA algorithms by showing further space or time efficiency with multiple updates per batch and early stopping.

Related papers

Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution [1.8029479474051309]
We design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model's output is uncertain. Experimental results on the action classification task show that our framework not only requires less computational overhead, but also improves accuracy compared to using a large model alone.
arXiv Detail & Related papers (2024-10-16T02:06:27Z)
Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model [20.054342930450055]
This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. PLRD allows for significant reductions in computational overhead and energy consumption. Our findings suggest that PLRD could set a new standard for the efficient scaling of LLMs.
arXiv Detail & Related papers (2024-06-28T15:27:57Z)
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models [55.608981341747246]
We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss. Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data.
arXiv Detail & Related papers (2024-06-08T12:58:13Z)
Direct Preference Optimization With Unobserved Preference Heterogeneity [16.91835461818937]
This paper presents a new method to align generative models with varied human preferences. We propose an Expectation-Maximization adaptation to DPO, generating a mixture of models based on latent preference types of the annotators. Our algorithms leverage the simplicity of DPO while accommodating diverse preferences.
arXiv Detail & Related papers (2024-05-23T21:25:20Z)
Low-rank Adaptation for Spatio-Temporal Forecasting [13.595533573828734]
We present a novel low-rank adaptation framework as an off-the-shelf plugin for existing spatialtemporal prediction models, STLo-RA. Our approach increases parameters and training time of the original models by less than 4%, still achieving consistent and sustained performance enhancement.
arXiv Detail & Related papers (2024-04-11T17:04:55Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models [90.6485663020735]
Fine-tuning from pre-trained ImageNet models has been a simple, effective, and popular approach for various computer vision tasks. We propose a joint Neural Architecture Search and Online Adaption framework named NASOA towards a faster task-oriented fine-tuning.
arXiv Detail & Related papers (2021-08-07T12:03:14Z)
When Ensembling Smaller Models is More Efficient than Single Large Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute. This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.