Related papers: SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders

SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders

URL: http://arxiv.org/abs/2603.03988v1
Date: Wed, 04 Mar 2026 12:32:43 GMT
Title: SORT: A Systematically Optimized Ranking Transformer for Industrial-scale Recommenders
Authors: Chunqi Wang, Bingchao Wu, Taotian Pang, Jiahao Wang, Jie Yang, Jia Liu, Hao Zhang, Hai Zhu, Lei Shen, Shizhun Wang, Bing Wang, Xiaoyi Zeng,
Abstract summary: SORT (Systematically Optimized Ranking Transformer) is a scalable model designed to bridge the gap between Transformers and industrial-scale ranking models.<n>We address the high feature sparsity and low label density challenges through a series of optimizations.<n>SORT exhibits excellent scalability across data size, model size and sequence length, while remaining flexible at integrating diverse features.
Score: 21.80413275965637
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Transformers have achieved remarkable success in LLMs through superior scalability, their application in industrial-scale ranking models remains nascent, hindered by the challenges of high feature sparsity and low label density. In this paper, we propose SORT (Systematically Optimized Ranking Transformer), a scalable model designed to bridge the gap between Transformers and industrial-scale ranking models. We address the high feature sparsity and low label density challenges through a series of optimizations, including request-centric sample organization, local attention, query pruning and generative pre-training. Furthermore, we introduce a suite of refinements to the tokenization, multi-head attention (MHA), and feed-forward network (FFN) modules, which collectively stabilize the training process and enlarge the model capacity. To maximize hardware efficiency, we optimize our training system to elevate the model FLOPs utilization (MFU) to 22%. Extensive experiments demonstrate that SORT outperforms strong baselines and exhibits excellent scalability across data size, model size and sequence length, while remaining flexible at integrating diverse features. Finally, online A/B testing in large-scale e-commerce scenarios confirms that SORT achieves significant gains in key business metrics, including orders (+6.35%), buyers (+5.97%) and GMV (+5.47%), while simultaneously halving latency (-44.67%) and doubling throughput (+121.33%).

Related papers

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models [97.55009021098554]
This work aims to identify the key determinants of SLMs' real-device latency and offer generalizable principles and methodologies for SLM design and training.<n>We introduce a new family of hybrid SLMs, called Nemotron-Flash, which significantly advances the accuracy-efficiency frontier of state-of-the-art SLMs.
arXiv Detail & Related papers (2025-11-24T08:46:36Z)
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use [73.72524040856052]
AgentFlow is a trainable, in-the-flow agentic framework that coordinates four modules (planner, executor, verifier, generator) through an evolving memory.<n>Flow-GRPO tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates.<n>AgentFlow with a 7B-scale backbone outperforms top-performing baselines with average accuracy gains of 14.9% on search, 14.0% on agentic, 14.5% on mathematical, and 4.1% on scientific tasks.
arXiv Detail & Related papers (2025-10-07T05:32:44Z)
Synergistic Enhancement of Requirement-to-Code Traceability: A Framework Combining Large Language Model based Data Augmentation and an Advanced Encoder [5.241456612683375]
This paper proposes and validates a framework that integrates large language model (LLM)-driven data augmentation with an advanced encoder.<n>We first demonstrate that data augmentation, optimized through a systematic evaluation of bi-directional and zero/few-shot prompting strategies, is highly effective.<n>We further enhance an established, state-of-the-art pre-trained language model based method by incorporating an encoder distinguished by a broader pre-training corpus and an extended context window.
arXiv Detail & Related papers (2025-09-24T14:14:21Z)
Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services [9.687789919349523]
We propose Fremer, an efficient and effective deep forecasting model.<n>Fremer fulfills three critical requirements: it demonstrates superior efficiency, outperforming most Transformer-based forecasting models.<n>It achieves exceptional accuracy, surpassing all state-of-the-art (SOTA) models in workload forecasting.
arXiv Detail & Related papers (2025-07-17T08:51:28Z)
Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale [19.60416591361918]
Fine-Grained Optimization (FGO) is a scalable framework that divides large optimization tasks into manageable subsets, performs targeted optimizations, and systematically combines optimized components through progressive merging.<n> evaluation across ALFWorld, LogisticsQA, and GAIA benchmarks demonstrate that FGO outperforms existing approaches by 1.6-8.6% while reducing average prompt token consumption by 56.3%.
arXiv Detail & Related papers (2025-05-06T20:50:27Z)
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors [104.5401871607713]
This paper proposes Weakfor-Strong Harnessing (W4S), a novel framework that customizes smaller, cost-efficient language models to design and optimize for harnessing stronger models.<n>W4S formulates design as a multi-turn markov decision process and introduces reinforcement learning for agentic workflow optimization.<n> Empirical results demonstrate the superiority of W4S that our 7B meta-agent, trained with just one GPU hour, outperforms the strongest baseline by 2.9% 24.6% across eleven benchmarks.
arXiv Detail & Related papers (2025-04-07T07:27:31Z)
Meta-Computing Enhanced Federated Learning in IIoT: Satisfaction-Aware Incentive Scheme via DRL-Based Stackelberg Game [50.6166553799783]
Efficient IIoT operations require a trade-off between model quality and training latency.<n>This paper designs a satisfaction function that accounts for data size, Age of Information (AoI), and training latency for meta-computing.<n>We employ a deep reinforcement learning approach to learn the Stackelberg equilibrium.
arXiv Detail & Related papers (2025-02-10T03:33:36Z)
Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.<n>In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.<n> DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z)
Adaptive Rank Allocation for Federated Parameter-Efficient Fine-Tuning of Language Models [40.69348434971122]
We propose FedARA, a novel Adaptive Rank Allocation framework for federated parameter-efficient fine-tuning of language models.<n>FedARA consistently outperforms baselines by an average of 6.95% to 8.49% across various datasets and models under heterogeneous data.<n>Experiments on various edge devices demonstrate substantial decreases in total training time and energy consumption by up to 48.90% and 46.95%, respectively.
arXiv Detail & Related papers (2025-01-24T11:19:07Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
Efficient Federated Intrusion Detection in 5G ecosystem using optimized BERT-based model [0.7100520098029439]
5G offers advanced services, supporting applications such as intelligent transportation, connected healthcare, and smart cities within the Internet of Things (IoT) These advancements introduce significant security challenges, with increasingly sophisticated cyber-attacks. This paper proposes a robust intrusion detection system (IDS) using federated learning and large language models (LLMs)
arXiv Detail & Related papers (2024-09-28T15:56:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.