A Learnable Fully Interacted Two-Tower Model for Pre-Ranking System
- URL: http://arxiv.org/abs/2509.12948v1
- Date: Tue, 16 Sep 2025 10:52:03 GMT
- Title: A Learnable Fully Interacted Two-Tower Model for Pre-Ranking System
- Authors: Chao Xiong, Xianwen Yu, Wei Xu, Lei Cheng, Chuan Yuan, Linjian Mo,
- Abstract summary: The two-tower model is widely used in pre-ranking systems due to a good balance between efficiency and effectiveness.<n>A novel architecture named learnable Fully Interacted Two-tower Model (FIT) is proposed, which enables rich information interactions.
- Score: 15.03225449071182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-ranking plays a crucial role in large-scale recommender systems by significantly improving the efficiency and scalability within the constraints of providing high-quality candidate sets in real time. The two-tower model is widely used in pre-ranking systems due to a good balance between efficiency and effectiveness with decoupled architecture, which independently processes user and item inputs before calculating their interaction (e.g. dot product or similarity measure). However, this independence also leads to the lack of information interaction between the two towers, resulting in less effectiveness. In this paper, a novel architecture named learnable Fully Interacted Two-tower Model (FIT) is proposed, which enables rich information interactions while ensuring inference efficiency. FIT mainly consists of two parts: Meta Query Module (MQM) and Lightweight Similarity Scorer (LSS). Specifically, MQM introduces a learnable item meta matrix to achieve expressive early interaction between user and item features. Moreover, LSS is designed to further obtain effective late interaction between the user and item towers. Finally, experimental results on several public datasets show that our proposed FIT significantly outperforms the state-of-the-art baseline pre-ranking models.
Related papers
- Model Merging in the Essential Subspace [78.5390284258307]
Model merging aims to integrate multiple task-specific fine-tuned models into a single multi-task model without additional training.<n>Despite extensive research, task interference remains a major obstacle that often undermines the performance of merged models.<n>We propose ESM (Essential Subspace Merging), a robust framework for effective model merging.
arXiv Detail & Related papers (2026-02-23T00:33:38Z) - Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems [5.897678894426804]
MLCC is a structured feature interaction architecture that organizes feature crosses through hierarchical compression and dynamic composition.<n>MC-MLCC is a Multi-Channel extension that decomposes feature interactions into parallel subspaces.<n>Our proposed models consistently outperform strong DLRM-style baselines by up to 0.52 AUC, while reducing model parameters and FLOPs by up to 26$times$ under comparable performance.
arXiv Detail & Related papers (2026-02-12T15:06:46Z) - Explicit Multi-head Attention for Inter-head Interaction in Large Language Models [70.96854312026319]
Multi-head Explicit Attention (MEA) is a simple yet effective attention variant that explicitly models cross-head interaction.<n>MEA shows strong robustness in pretraining, which allows the use of larger learning rates that lead to faster convergence.<n>This enables a practical key-value cache compression strategy that reduces KV-cache memory usage by 50% with negligible performance loss.
arXiv Detail & Related papers (2026-01-27T13:45:03Z) - Cross-Modal Attention Network with Dual Graph Learning in Multimodal Recommendation [12.802844514133255]
Cross-modal Recursive Attention Network with dual graph Embedding (CRANE)<n>We design a core Recursive Cross-Modal Attention (RCA) mechanism that iteratively refines modality features based on cross-correlations in a joint latent space.<n>For symmetric multimodal learning, we explicitly construct users' multimodal profiles by aggregating features of their interacted items.
arXiv Detail & Related papers (2026-01-16T10:09:39Z) - Generative Early Stage Ranking [14.15517442047903]
We propose a Generative Early Stage Ranking (GESR) paradigm to balance effectiveness and efficiency.<n>The GESR paradigm has shown substantial improvements in topline metrics, engagement, and consumption tasks.<n>To the best of our knowledge, this marks the first successful deployment of full target-aware attention sequence modeling within an ESR stage at such a scale.
arXiv Detail & Related papers (2025-11-26T06:29:18Z) - MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models [52.876185634349575]
We propose to incorporate Mixture of Intra- and Inter-Modality Experts (MoIIE) to Large Vision-Language Models (LVLMs)<n>For each token, expert routing is guided by its modality, directing tokens to their respective intra-modality experts as well as a shared pool of inter-modality experts.<n>Our MoIIE models with 5.5B and 11.3B activated parameters match or even surpass the performance of existing advanced open-source MoE-LLMs based multi-modal models.
arXiv Detail & Related papers (2025-08-13T13:00:05Z) - Optimizing Recall or Relevance? A Multi-Task Multi-Head Approach for Item-to-Item Retrieval in Recommendation [23.61568268070558]
We propose a Multi-Task and Multi-Head I2I retrieval model that achieves both high recall and semantic relevance.<n>We evaluate MTMH using proprietary data from a commercial platform serving billions of users and demonstrate that it can improve recall by up to 14.4% and semantic relevance by up to 56.6%.
arXiv Detail & Related papers (2025-06-06T17:00:20Z) - HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems [9.100242205591224]
We propose the Hierarchical Interaction-Enhanced Two-Tower (HIT) model.<n>This architecture augments the prevailing two-tower paradigm with two key components.<n>The HIT model has been successfully deployed in Tencent's online display advertising system.
arXiv Detail & Related papers (2025-05-26T11:35:04Z) - Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching [25.672699790866726]
Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains.<n>We propose a "cross-interaction decoupling architecture" within our matching paradigm.
arXiv Detail & Related papers (2025-02-28T03:40:37Z) - iLOCO: Distribution-Free Inference for Feature Interactions [4.56754610152086]
We develop a new model-agnostic metric for measuring the importance of pairwise feature interactions.<n>We also introduce an ensemble learning method for calculating the iLOCO metric and confidence intervals.<n>We validate our iLOCO metric and our confidence intervals on both synthetic and real data sets.
arXiv Detail & Related papers (2025-02-10T16:49:46Z) - Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search [59.75749613951193]
We propose Data Influence-oriented Tree Search (DITS) to guide both tree search and data selection.<n>By leveraging influence scores, we effectively identify the most impactful data for system improvement.<n>We derive influence score estimation methods tailored for non-differentiable metrics.
arXiv Detail & Related papers (2025-02-02T23:20:16Z) - LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - T-REX: Mixture-of-Rank-One-Experts with Semantic-aware Intuition for Multi-task Large Language Model Finetuning [31.276142111455847]
Large language models (LLMs) encounter significant adaptation challenges in diverse multitask finetuning.<n>We design a novel framework, mixunderlinetextbfTureunderlinetextbf-of-underlinetextbfRank-onunderlinetextbfE-eunderlinetextbfXper ts (textttT-REX)<n>Rank-1 experts enable a mix-and-match mechanism to quadratically expand the vector subspace of experts with linear parameter overheads, achieving approximate error reduction with optimal
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - Beyond Two-Tower Matching: Learning Sparse Retrievable
Cross-Interactions for Recommendation [80.19762472699814]
Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications.
It suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving.
We propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval.
arXiv Detail & Related papers (2023-11-30T03:13:36Z) - A Co-Interactive Transformer for Joint Slot Filling and Intent Detection [61.109486326954205]
Intent detection and slot filling are two main tasks for building a spoken language understanding (SLU) system.
Previous studies either model the two tasks separately or only consider the single information flow from intent to slot.
We propose a Co-Interactive Transformer to consider the cross-impact between the two tasks simultaneously.
arXiv Detail & Related papers (2020-10-08T10:16:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.