Related papers: HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems

HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems

URL: http://arxiv.org/abs/2505.19849v2
Date: Sat, 02 Aug 2025 00:31:18 GMT
Title: HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems
Authors: Haoqiang Yang, Congde Yuan, Kun Bai, Mengzhuo Guo, Wei Yang, Chao Zhou,
Abstract summary: We propose the Hierarchical Interaction-Enhanced Two-Tower (HIT) model.<n>This architecture augments the prevailing two-tower paradigm with two key components.<n>The HIT model has been successfully deployed in Tencent's online display advertising system.
Score: 9.100242205591224
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Online display advertising platforms rely on pre-ranking systems to efficiently filter and prioritize candidate ads from large corpora, balancing relevance to users with strict computational constraints. The prevailing two-tower architecture, though highly efficient due to its decoupled design and pre-caching, suffers from cross-domain interaction and coarse similarity metrics, undermining its capacity to model complex user-ad relationships. In this study, we propose the Hierarchical Interaction-Enhanced Two-Tower (HIT) model, a new architecture that augments the two-tower paradigm with two key components: $\textit{generators}$ that pre-generate holistic vectors incorporating coarse-grained user-ad interactions through a dual-generator framework with a cosine-similarity-based generation loss as the training objective, and $\textit{multi-head representers}$ that project embeddings into multiple latent subspaces to capture fine-grained, multi-faceted user interests and multi-dimensional ad attributes. This design enhances modeling effectiveness without compromising inference efficiency. Extensive experiments on public datasets and large-scale online A/B testing on Tencent's advertising platform demonstrate that HIT significantly outperforms several baselines in relevance metrics, yielding a $1.66\%$ increase in Gross Merchandise Volume and a $1.55\%$ improvement in Return on Investment, alongside similar serving latency to the vanilla two-tower models. The HIT model has been successfully deployed in Tencent's online display advertising system, serving billions of impressions daily. The code is available at https://github.com/HarveyYang123/HIT_model.

Related papers

GenAgent: Scaling Text-to-Image Generation via Agentic Multimodal Reasoning [54.42973725693]
We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model.<n>GenAgent significantly boosts base generator(FLUX.1-dev) performance on GenEval++ and WISE.<n>Our framework demonstrates three key properties: 1) cross-tool generalization to generators with varying capabilities, 2) test-time scaling with consistent improvements across interaction rounds, and 3) task-adaptive reasoning that automatically adjusts to different tasks.
arXiv Detail & Related papers (2026-01-26T14:49:04Z)
Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank [65.00301565190824]
mname is a plug-and-play training framework that requires no external encoders.<n>mname achieves a state-of-the-art FID of textbf2.40 within 400k steps, significantly outperforming comparable methods.
arXiv Detail & Related papers (2025-12-09T14:39:26Z)
DUET: Dual Model Co-Training for Entire Space CTR Prediction [34.35929309131385]
textbfDUET (textbfDUal Model Co-Training for textbfDUal Model Co-Training for textbfEntire Space CtextbfTR Prediction) is a set-wise pre-ranking framework that achieves expressive modeling under tight computational budgets.<n>It consistently outperforms state-of-the-art baselines and achieves improvements across multiple core business metrics.
arXiv Detail & Related papers (2025-10-28T12:46:33Z)
A Learnable Fully Interacted Two-Tower Model for Pre-Ranking System [15.03225449071182]
The two-tower model is widely used in pre-ranking systems due to a good balance between efficiency and effectiveness.<n>A novel architecture named learnable Fully Interacted Two-tower Model (FIT) is proposed, which enables rich information interactions.
arXiv Detail & Related papers (2025-09-16T10:52:03Z)
Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition [6.168286187549952]
We propose a hybrid end-to-end framework that factorizes learning across three key concepts to reduce inference cost by $330times$ versus prior art.<n> Experiments show that our method results in a lightweight architecture achieving state-of-the-art video recognition performance.
arXiv Detail & Related papers (2025-03-17T21:13:48Z)
Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching [25.672699790866726]
Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains.<n>We propose a "cross-interaction decoupling architecture" within our matching paradigm.
arXiv Detail & Related papers (2025-02-28T03:40:37Z)
FuXi-$α$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer [81.12174905444229]
Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy.<n>We propose a new model called FuXi-$alpha$ to address these issues.<n>Our model outperforms existing models, with its performance continuously improving as the model size increases.
arXiv Detail & Related papers (2025-02-05T09:46:54Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval [1.6874375111244329]
State-of-the-art image retrieval systems train specific neural networks for each dataset.<n>Off-the-shelf foundation models fall short in achieving performance comparable to dataset-specific models.<n>We introduce Autoencoders with Strong Variance Constraints (AE-SVC), which significantly improves the performance of foundation models.
arXiv Detail & Related papers (2024-10-09T16:05:16Z)
A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting [3.5066463427087777]
We introduce two lightweight models to enhance the versatility of crowd-counting models. These models maintain the same downstream architecture while incorporating two distinct backbones: MobileNet and MobileViT. We leverage Adjacent Feature Fusion to extract diverse scale features from a Pre-Trained Model (PTM) and subsequently combine these features seamlessly.
arXiv Detail & Related papers (2024-01-11T15:13:31Z)
Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation [80.19762472699814]
Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications. It suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving. We propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval.
arXiv Detail & Related papers (2023-11-30T03:13:36Z)
UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing [27.459774494479227]
We present a unified user-item matching framework to simultaneously conduct item recommendation and user targeting with just one model. Our framework results in significant performance gains in comparison with the state-of-the-art methods, with greatly reduced cost on computing resources and daily maintenance.
arXiv Detail & Related papers (2023-07-19T13:49:35Z)
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion [54.33764537135906]
VideoQA Transformer models demonstrate competitive performance on standard benchmarks. Do these models capture the rich multimodal structures and dynamics from video and text jointly? Are they achieving high scores by exploiting biases and spurious features?
arXiv Detail & Related papers (2023-06-15T06:45:46Z)
VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations. Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data. Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z)
Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images. To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN. In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.