Related papers: Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching

Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching

URL: http://arxiv.org/abs/2502.20687v1
Date: Fri, 28 Feb 2025 03:40:37 GMT
Title: Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching
Authors: Yihan Wang, Fei Xiong, Zhexin Han, Qi Song, Kaiqiao Zhan, Ben Wang,
Abstract summary: Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains.<n>We propose a "cross-interaction decoupling architecture" within our matching paradigm.
Score: 25.672699790866726
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains, such as content recommendations, advertisement systems, and search engines. This model efficiently handles large-scale candidate item screening by separating user and item representations. However, the decoupling network also leads to a neglect of potential information interaction between the user and item representations. Current state-of-the-art (SOTA) approaches include adding a shallow fully connected layer(i.e., COLD), which is limited by performance and can only be used in the ranking stage. For performance considerations, another approach attempts to capture historical positive interaction information from the other tower by regarding them as the input features(i.e., DAT). Later research showed that the gains achieved by this method are still limited because of lacking the guidance on the next user intent. To address the aforementioned challenges, we propose a "cross-interaction decoupling architecture" within our matching paradigm. This user-tower architecture leverages a diffusion module to reconstruct the next positive intention representation and employs a mixed-attention module to facilitate comprehensive cross-interaction. During the next positive intention generation, we further enhance the accuracy of its reconstruction by explicitly extracting the temporal drift within user behavior sequences. Experiments on two real-world datasets and one industrial dataset demonstrate that our method outperforms the SOTA two-tower models significantly, and our diffusion approach outperforms other generative models in reconstructing item representations.

Related papers

Future Link Prediction Without Memory or Aggregation [25.066464612400768]
Future link prediction on temporal graphs is a fundamental task with wide applicability in real-world dynamic systems.<n>Existing methods typically rely on complex memory and aggregation modules, yet struggle to handle unseen edges.<n>We propose Cross-Attention based Future Link Predictor on Temporal Graphs (CRAFT), a simple yet effective architecture that discards memory and aggregation modules.
arXiv Detail & Related papers (2025-05-26T01:53:27Z)
MixRec: Heterogeneous Graph Collaborative Filtering [21.96510707666373]
We present a graph collaborative filtering model MixRec to disentangling users' multi-behavior interaction patterns.<n>Our model achieves this by incorporating intent disentanglement and multi-behavior modeling.<n>We also introduce a novel contrastive learning paradigm that adaptively explores the advantages of self-supervised data augmentation.
arXiv Detail & Related papers (2024-12-18T13:12:36Z)
A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models. Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning. We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z)
Generative Diffusion Models for Sequential Recommendations [7.948486055890262]
Generative models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) have shown promise in sequential recommendation tasks. This research introduces enhancements to the DiffuRec architecture to improve robustness and incorporates a cross-attention mechanism in the Approximator to better capture relevant user-item interactions.
arXiv Detail & Related papers (2024-10-25T09:39:05Z)
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA) CEFA consists of a feature alignment module and a context enhancement module. Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z)
Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity [10.683635786183894]
CF-Diff is a new diffusion model-based collaborative filtering method. It is capable of making full use of collaborative signals along with multi-hop neighbors. It achieves remarkable gains up to 7.29% compared to the best competitor.
arXiv Detail & Related papers (2024-04-22T14:49:46Z)
Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation [80.19762472699814]
Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications. It suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving. We propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval.
arXiv Detail & Related papers (2023-11-30T03:13:36Z)
Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input. We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z)
Local Consensus Enhanced Siamese Network with Reciprocal Loss for Two-view Correspondence Learning [35.5851523517487]
Two-view correspondence learning usually establish an end-to-end network to jointly predict correspondence reliability and relative pose. We propose a Local Feature Consensus (LFC) plugin block to augment the features of existing models. We extend existing models to a Siamese network with a reciprocal loss that exploits the supervision of mutual projection.
arXiv Detail & Related papers (2023-08-06T22:20:09Z)
Masked Transformer for Neighhourhood-aware Click-Through Rate Prediction [74.52904110197004]
We propose Neighbor-Interaction based CTR prediction, which put this task into a Heterogeneous Information Network (HIN) setting. In order to enhance the representation of the local neighbourhood, we consider four types of topological interaction among the nodes. We conduct comprehensive experiments on two real world datasets and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly.
arXiv Detail & Related papers (2022-01-25T12:44:23Z)
DCAP: Deep Cross Attentional Product Network for User Response Prediction [20.17934000984361]
We propose a novel architecture Deep Cross Attentional Product Network (DCAP) DCAP keeps cross network's benefits in modeling high-order feature interactions explicitly at the vector-wise level. Our proposed model can be easily implemented and train in parallel.
arXiv Detail & Related papers (2021-05-18T16:27:20Z)
Hierarchical Modeling for Out-of-Scope Domain and Intent Classification [55.23920796595698]
This paper focuses on out-of-scope intent classification in dialog systems. We propose a hierarchical multi-task learning approach based on a joint model to classify domain and intent simultaneously. Experiments show that the model outperforms existing methods in terms of accuracy, out-of-scope recall and F1.
arXiv Detail & Related papers (2021-04-30T06:38:23Z)
Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding. At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network. With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.