Related papers: Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning

Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning

URL: http://arxiv.org/abs/2511.03245v1
Date: Wed, 05 Nov 2025 07:16:49 GMT
Title: Decoupled Multi-Predictor Optimization for Inference-Efficient Model Tuning
Authors: Liwei Luo, Shuaitengyuan Li, Dongwei Ren, Qilong Wang, Pengfei Zhu, Qinghua Hu,
Abstract summary: Early exiting in conjunction with multi-stage predictors offers a straightforward way to achieve an inference-efficient model.<n>How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors?<n>We propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages.
Score: 59.27124079347153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, remarkable progress has been made in large-scale pre-trained model tuning, and inference efficiency is becoming more crucial for practical deployment. Early exiting in conjunction with multi-stage predictors, when cooperated with a parameter-efficient fine-tuning strategy, offers a straightforward way to achieve an inference-efficient model. However, a key challenge remains unresolved: How can early stages provide low-level fundamental features to deep stages while simultaneously supplying high-level discriminative features to early-stage predictors? To address this problem, we propose a Decoupled Multi-Predictor Optimization (DMPO) method to effectively decouple the low-level representative ability and high-level discriminative ability in early stages. First, in terms of architecture, we introduce a lightweight bypass module into multi-stage predictors for functional decomposition of shallow features from early stages, while a high-order statistics-based predictor is developed for early stages to effectively enhance their discriminative ability. To reasonably train our multi-predictor architecture, a decoupled optimization is proposed to allocate two-phase loss weights for multi-stage predictors during model tuning, where the initial training phase enables the model to prioritize the acquisition of discriminative ability of deep stages via emphasizing representative ability of early stages, and the latter training phase drives discriminative ability towards earlier stages as much as possible. As such, our DMPO can effectively decouple representative and discriminative abilities in early stages in terms of architecture design and model optimization. Experiments across various datasets and pre-trained backbones demonstrate that DMPO clearly outperforms its counterparts when reducing computational cost.

Related papers

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms [35.74919627230777]
We argue that an inference-first perspective can inspire novel generative pre-training algorithms.<n>We show how addressing limitations in diffusion models' inference process through targeted modifications yields a stable, single-stage algorithm.
arXiv Detail & Related papers (2025-03-10T10:27:30Z)
A First-order Generative Bilevel Optimization Framework for Diffusion Models [57.40597004445473]
Diffusion models iteratively denoise data samples to synthesize high-quality outputs.<n>Traditional bilevel methods fail due to the infinite-dimensional probability space and prohibitive sampling costs.<n>We formalize this challenge as a generative bilevel optimization problem.<n>Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes.
arXiv Detail & Related papers (2025-02-12T21:44:06Z)
Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation. In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales. Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
A Trajectory-Based Bayesian Approach to Multi-Objective Hyperparameter Optimization with Epoch-Aware Trade-Offs [8.598456741786801]
Training machine learning models inherently involve a resource-intensive and noisy iterative learning procedure.<n>We present a trajectory-based multi-objective Bayesian optimization algorithm characterized by two features.<n> Experiments show that our algorithm can effectively identify the desirable trade-offs while improving tuning efficiency.
arXiv Detail & Related papers (2024-05-24T07:43:45Z)
CoopInit: Initializing Generative Adversarial Networks via Cooperative Learning [50.90384817689249]
CoopInit is a cooperative learning-based strategy that can quickly learn a good starting point for GANs. We demonstrate the effectiveness of the proposed approach on image generation and one-sided unpaired image-to-image translation tasks.
arXiv Detail & Related papers (2023-03-21T07:49:32Z)
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information [77.80071279597665]
We propose an all-in-one single-stage pre-training approach, named Maximizing Multi-modal Mutual Information Pre-training (M3I Pre-training) Our approach achieves better performance than previous pre-training methods on various vision benchmarks, including ImageNet classification, object detection, LVIS long-tailed object detection, and ADE20k semantic segmentation.
arXiv Detail & Related papers (2022-11-17T18:59:49Z)
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models [16.586312156966635]
Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite their advanced capacity. Existing statically compressed models are unaware of the diverse complexities between input instances. We propose a collaborative optimization for PLMs that integrates static model compression and dynamic inference acceleration.
arXiv Detail & Related papers (2022-10-27T15:06:40Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.