An Efficient Embedding Based Ad Retrieval with GPU-Powered Feature Interaction
- URL: http://arxiv.org/abs/2511.22460v1
- Date: Thu, 27 Nov 2025 13:48:37 GMT
- Title: An Efficient Embedding Based Ad Retrieval with GPU-Powered Feature Interaction
- Authors: Yifan Lei, Jiahua Luo, Tingyu Jiang, Bo Zhang, Lifeng Wang, Dapeng Liu, Zhaoren Wu, Haijie Gu, Huan Yu, Jie Jiang,
- Abstract summary: In large-scale advertising recommendation systems, retrieval serves as a critical component, aiming to efficiently select a subset of candidate ads relevant to user behaviors from a massive ad inventory for subsequent ranking and recommendation.<n>The Embedding-Based Retrieval (EBR) methods modeled by the dual-tower network are widely used in the industry to maintain both retrieval efficiency and accuracy.<n>This paper proposes an efficient GPU-based feature interaction for the dual-tower network to significantly improve retrieval accuracy while substantially reducing computational costs.
- Score: 11.328559370304134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In large-scale advertising recommendation systems, retrieval serves as a critical component, aiming to efficiently select a subset of candidate ads relevant to user behaviors from a massive ad inventory for subsequent ranking and recommendation. The Embedding-Based Retrieval (EBR) methods modeled by the dual-tower network are widely used in the industry to maintain both retrieval efficiency and accuracy. However, the dual-tower model has significant limitations: the embeddings of users and ads interact only at the final inner product computation, resulting in insufficient feature interaction capabilities. Although DNN-based models with both user and ad as input features, allowing for early-stage interaction between these features, are introduced in the ranking stage to mitigate this issue, they are computationally infeasible for the retrieval stage. To bridge this gap, this paper proposes an efficient GPU-based feature interaction for the dual-tower network to significantly improve retrieval accuracy while substantially reducing computational costs. Specifically, we introduce a novel compressed inverted list designed for GPU acceleration, enabling efficient feature interaction computation at scale. To the best of our knowledge, this is the first framework in the industry to successfully implement Wide and Deep in a retrieval system. We apply this model to the real-world business scenarios in Tencent Advertising, and experimental results demonstrate that our method outperforms existing approaches in offline evaluation and has been successfully deployed to Tencent's advertising recommendation system, delivering significant online performance gains. This improvement not only validates the effectiveness of the proposed method, but also provides new practical guidance for optimizing large-scale ad retrieval systems.
Related papers
- WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking [60.35109192765302]
Information seeking is a core capability that enables autonomous reasoning and decision-making.<n>We propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories.<n>Our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.
arXiv Detail & Related papers (2025-10-28T17:51:42Z) - A Learnable Fully Interacted Two-Tower Model for Pre-Ranking System [15.03225449071182]
The two-tower model is widely used in pre-ranking systems due to a good balance between efficiency and effectiveness.<n>A novel architecture named learnable Fully Interacted Two-tower Model (FIT) is proposed, which enables rich information interactions.
arXiv Detail & Related papers (2025-09-16T10:52:03Z) - OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is the first industrial-deployed end-to-end generative framework for e-commerce search.<n>OneSearch reduces operational expenditure by 75.40% and improves Model FLOPs Utilization from 3.26% to 27.32%.<n>The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users.
arXiv Detail & Related papers (2025-09-03T11:50:04Z) - EGA-V1: Unifying Online Advertising with End-to-End Learning [17.943921299281207]
We present EGA-V1, an end-to-end generative architecture that unifies online advertising ranking as one model.<n>EGA-V1 replaces cascaded stages with a single model to directly generate optimal ad sequences from the full candidate ad corpus.
arXiv Detail & Related papers (2025-05-26T09:33:54Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - Hierarchical Structured Neural Network: Efficient Retrieval Scaling for Large Scale Recommendation [16.21377996349377]
We introduce the Hierarchical Structured Neural Network (HSNN), an efficient deep neural network model to learn intricate user and item interactions.<n>HSNN achieves substantial improvement in offline evaluation compared to prevailing methods.
arXiv Detail & Related papers (2024-08-13T05:53:46Z) - Retrieval Augmentation via User Interest Clustering [57.63883506013693]
Industrial recommender systems are sensitive to the patterns of user-item engagement.
We propose a novel approach that efficiently constructs user interest and facilitates low computational cost inference.
Our approach has been deployed in multiple products at Meta, facilitating short-form video related recommendation.
arXiv Detail & Related papers (2024-08-07T16:35:10Z) - Learning Fair Ranking Policies via Differentiable Optimization of
Ordered Weighted Averages [55.04219793298687]
This paper shows how efficiently-solvable fair ranking models can be integrated into the training loop of Learning to Rank.
In particular, this paper is the first to show how to backpropagate through constrained optimizations of OWA objectives, enabling their use in integrated prediction and decision models.
arXiv Detail & Related papers (2024-02-07T20:53:53Z) - Beyond Two-Tower Matching: Learning Sparse Retrievable
Cross-Interactions for Recommendation [80.19762472699814]
Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications.
It suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving.
We propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval.
arXiv Detail & Related papers (2023-11-30T03:13:36Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Building an Efficient and Effective Retrieval-based Dialogue System via
Mutual Learning [27.04857039060308]
We propose to combine the best of both worlds to build a retrieval system.
We employ a fast bi-encoder to replace the traditional feature-based pre-retrieval model.
We train the pre-retrieval model and the re-ranking model at the same time via mutual learning.
arXiv Detail & Related papers (2021-10-01T01:32:33Z) - Towards a Better Tradeoff between Effectiveness and Efficiency in
Pre-Ranking: A Learnable Feature Selection based Approach [12.468550800027808]
In real-world search, recommendation, and advertising systems, the multi-stage ranking architecture is commonly adopted.
In this paper, a novel pre-ranking approach is proposed which supports complicated models with interaction-focused architecture.
It achieves a better tradeoff between effectiveness and efficiency by utilizing the proposed learnable Feature Selection method.
arXiv Detail & Related papers (2021-05-17T09:48:15Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.