Related papers: Equip Pre-ranking with Target Attention by Residual Quantization

Equip Pre-ranking with Target Attention by Residual Quantization

URL: http://arxiv.org/abs/2509.16931v2
Date: Wed, 24 Sep 2025 09:26:28 GMT
Title: Equip Pre-ranking with Target Attention by Residual Quantization
Authors: Yutong Li, Yu Zhu, Yichen Qiao, Ziyu Guan, Lv Shao, Tong Liu, Bo Zheng,
Abstract summary: TARQ is a novel pre-ranking framework for industrial recommendation systems.<n>Our model has been fully deployed in production, serving tens of millions of daily active users.
Score: 28.523618960969472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The pre-ranking stage in industrial recommendation systems faces a fundamental conflict between efficiency and effectiveness. While powerful models like Target Attention (TA) excel at capturing complex feature interactions in the ranking stage, their high computational cost makes them infeasible for pre-ranking, which often relies on simplistic vector-product models. This disparity creates a significant performance bottleneck for the entire system. To bridge this gap, we propose TARQ, a novel pre-ranking framework. Inspired by generative models, TARQ's key innovation is to equip pre-ranking with an architecture approximate to TA by Residual Quantization. This allows us to bring the modeling power of TA into the latency-critical pre-ranking stage for the first time, establishing a new state-of-the-art trade-off between accuracy and efficiency. Extensive offline experiments and large-scale online A/B tests at Taobao demonstrate TARQ's significant improvements in ranking performance. Consequently, our model has been fully deployed in production, serving tens of millions of daily active users and yielding substantial business improvements.

Related papers

DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models [51.76664843721462]
DeepThinkVLA is a new architecture for Vision-Language-Action models.<n>It generates sequential CoT with causal attention and switches to bidirectional attention for fast decoding of action vectors.<n>It achieves a 97.0% success rate on the LIBERO benchmark.
arXiv Detail & Related papers (2025-10-31T05:26:16Z)
GENIAL: Generative Design Space Exploration via Network Inversion for Low Power Algorithmic Logic Units [4.148469311862123]
We introduce a machine learning-based framework for the automatic generation and optimization of arithmetic units.<n>At the core of GENIAL is a Transformer-based surrogate model trained in two stages.<n>Experiments on large datasets demonstrate that GENIAL is consistently more sample efficient than other methods.
arXiv Detail & Related papers (2025-07-25T06:34:59Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
Sliding Window Attention Training for Efficient Large Language Models [55.56483740523027]
We introduce SWAT, which enables efficient long-context handling via Sliding Window Attention Training.<n>This paper first attributes the inefficiency of Transformers to the attention sink phenomenon.<n>We replace softmax with the sigmoid function and utilize a balanced ALiBi and Rotary Position Embedding for efficient information compression and retention.
arXiv Detail & Related papers (2025-02-26T05:31:44Z)
From Features to Transformers: Redefining Ranking for Scalable Impact [6.656478617123893]
LiGR is a large-scale ranking framework developed at LinkedIn.<n>We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items.
arXiv Detail & Related papers (2025-02-05T18:02:01Z)
RankTower: A Synergistic Framework for Enhancing Two-Tower Pre-Ranking Model [0.0]
In large-scale ranking systems, cascading architectures have been widely adopted to achieve a balance between efficiency and effectiveness. It is crucial for the pre-ranking model to maintain a balance between efficiency and accuracy to adhere to online latency constraints. We propose a novel neural network architecture called RankTower, which is designed to efficiently capture user-item interactions.
arXiv Detail & Related papers (2024-07-17T08:07:37Z)
Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking Model [13.573766789458118]
In large e-commerce platforms, the pre-ranking phase is crucial for filtering out the bulk of products in advance for the downstream ranking module. We propose a novel method: a Generalizable and RAnk-ConsistEnt Pre-Ranking Model (GRACE), which achieves: 1) Ranking consistency by introducing multiple binary classification tasks that predict whether a product is within the top-k results as estimated by the ranking model, which facilitates the addition of learning objectives on common point-wise ranking models; 2) Generalizability through contrastive learning of representation for all products by pre-training on a subset of ranking product embeddings
arXiv Detail & Related papers (2024-05-09T07:55:52Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
COPR: Consistency-Oriented Pre-Ranking for Online Advertising [27.28920707332434]
We introduce a consistency-oriented pre-ranking framework for online advertising. It employs a chunk-based sampling module and a plug-and-play rank alignment module to explicitly optimize consistency of ECPM-ranked results. When deployed in Taobao display advertising system, it achieves an improvement of up to +12.3% CTR and +5.6% RPM.
arXiv Detail & Related papers (2023-06-06T09:08:40Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
Efficient pre-training objectives for Transformers [84.64393460397471]
We study several efficient pre-training objectives for Transformers-based models. We prove that eliminating the MASK token and considering the whole output during the loss are essential choices to improve performance.
arXiv Detail & Related papers (2021-04-20T00:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.