MaRI: Accelerating Ranking Model Inference via Structural Re-parameterization in Large Scale Recommendation System
- URL: http://arxiv.org/abs/2602.23105v1
- Date: Thu, 26 Feb 2026 15:19:43 GMT
- Title: MaRI: Accelerating Ranking Model Inference via Structural Re-parameterization in Large Scale Recommendation System
- Authors: Yusheng Huang, Pengbo Xu, Shen Wang, Changxin Lao, Jiangxia Cao, Shuang Wen, Shuang Yang, Zhaojie Liu, Han Li, Kun Gai,
- Abstract summary: We propose MaRI, a novel Matrix Re- Parametersized Inference framework.<n>It serves as a complementary approach to existing techniques while accelerating ranking model inference without any accuracy loss.<n>MaRI is motivated by the observation that user-side computation is redundant in feature fusion matrix multiplication.
- Score: 24.4139949756995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ranking models, i.e., coarse-ranking and fine-ranking models, serve as core components in large-scale recommendation systems, responsible for scoring massive item candidates based on user preferences. To meet the stringent latency requirements of online serving, structural lightweighting or knowledge distillation techniques are commonly employed for ranking model acceleration. However, these approaches typically lead to a non-negligible drop in accuracy. Notably, the angle of lossless acceleration by optimizing feature fusion matrix multiplication, particularly through structural reparameterization, remains underexplored. In this paper, we propose MaRI, a novel Matrix Re-parameterized Inference framework, which serves as a complementary approach to existing techniques while accelerating ranking model inference without any accuracy loss. MaRI is motivated by the observation that user-side computation is redundant in feature fusion matrix multiplication, and we therefore adopt the philosophy of structural reparameterization to alleviate such redundancy.
Related papers
- Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control [82.30868101940068]
We propose a paradigm in which a model bootstraps its own performance without reliance on external data or teacher models.<n>Our theoretical analysis shows that RSIR acts as a data-driven implicit regularizer, smoothing the optimization landscape.<n>We show that even smaller models benefit, and weak models can generate effective training curricula for stronger ones.
arXiv Detail & Related papers (2026-02-17T15:31:32Z) - ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations [54.886931928255564]
Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning.<n>We propose a novel continuous-time optimization dynamic for LoRA factor matrices in the form of an ordinary differential equation (ODE)<n>We show that ODELoRA achieves stable feature learning, a property that is crucial for training deep neural networks at different scales of problem dimensionality.
arXiv Detail & Related papers (2026-02-07T10:19:36Z) - SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z) - Revealing Low-Dimensional Structure in 2D Richtmyer-Meshkov Instabilities via Parametric Reduced-Order Modeling [0.6999740786886536]
Richtmyer-Meshkov instability (RMI) is essential to many engineering tasks.<n>RMI causes the ablator and fuel to mix, introducing cold spots into the fuel and lowering performance.<n>We introduce a reduced-order model for two-dimensional RMI based on the Latent Space Dynamics Identification algorithm.
arXiv Detail & Related papers (2025-10-17T20:19:00Z) - Weight Spectra Induced Efficient Model Adaptation [54.8615621415845]
Fine-tuning large-scale foundation models incurs prohibitive computational costs.<n>We show that fine-tuning predominantly amplifies the top singular values while leaving the remainder largely intact.<n>We propose a novel method that leverages learnable rescaling of top singular directions.
arXiv Detail & Related papers (2025-05-29T05:03:29Z) - LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation [3.9426000822656224]
Low rank based adaptation has become increasingly challenging due to the sheer scale of modern large language models.<n>We propose an effective kernelization to further reduce the number of parameters required for adaptation tasks.<n>We achieve state-of-the-art performance with even higher accuracy with almost half the number of parameters as compared to conventional low rank based methods.
arXiv Detail & Related papers (2025-02-19T09:20:47Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Model order reduction of deep structured state-space models: A system-theoretic approach [0.0]
deep structured state-space models offer high predictive performance.
The learned representations often suffer from excessively large model orders, which render them unsuitable for control design purposes.
We introduce two regularization terms which can be incorporated into the training loss for improved model order reduction.
The presented regularizers lead to advantages in terms of parsimonious representations and faster inference resulting from the reduced order models.
arXiv Detail & Related papers (2024-03-21T21:05:59Z) - Symplectic Autoencoders for Model Reduction of Hamiltonian Systems [0.0]
It is crucial to preserve the symplectic structure associated with the system in order to ensure long-term numerical stability.
We propose a new neural network architecture in the spirit of autoencoders, which are established tools for dimension reduction.
In order to train the network, a non-standard gradient descent approach is applied.
arXiv Detail & Related papers (2023-12-15T18:20:25Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.