Related papers: Accelerating Learned Image Compression Through Modeling Neural Training Dynamics

Accelerating Learned Image Compression Through Modeling Neural Training Dynamics

URL: http://arxiv.org/abs/2505.18107v1
Date: Fri, 23 May 2025 17:03:13 GMT
Title: Accelerating Learned Image Compression Through Modeling Neural Training Dynamics
Authors: Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu,
Abstract summary: This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics.<n>We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes.<n>By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters.
Score: 11.729071258457138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.

Related papers

Training-free score-based diffusion for parameter-dependent stochastic dynamical systems [2.4755898204110642]
We present a training-free conditional diffusion model framework for learning flow maps of parameter-dependent SDEs.<n>A joint kernel-weighted Monte Carlo estimator approximates the conditional score function using trajectory data sampled at discrete parameter values.<n>The resulting generative model produces sample trajectories for any parameter value within the training range without retraining.
arXiv Detail & Related papers (2026-02-02T13:54:36Z)
Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization [5.952537659103525]
We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection.<n>We prove convergence of iterative singular value thresholding for training neural networks subject to a soft rank constraint.
arXiv Detail & Related papers (2025-07-12T23:39:14Z)
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning [9.51289606759621]
Training and fine-tuning large language models (LLMs) come with challenges related to memory and computational requirements.<n>Various techniques have been developed to tackle these challenges, such as low-rank adaptation (LoRA)<n>We introduce a new method inspired by a phenomenon we formally prove: as training progresses, the rank of the estimated gradient gradually decreases.
arXiv Detail & Related papers (2024-10-23T13:53:26Z)
A parametric framework for kernel-based dynamic mode decomposition using deep learning [0.0]
The proposed framework consists of two stages, offline and online. The online stage leverages those LANDO models to generate new data at a desired time instant. dimensionality reduction technique is applied to high-dimensional dynamical systems to reduce the computational cost of training.
arXiv Detail & Related papers (2024-09-25T11:13:50Z)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.<n>We propose a novel model fine-tuning method to make full use of these ineffective parameters.<n>Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)
Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem) AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient. Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling. It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.