DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for
CTR Prediction
- URL: http://arxiv.org/abs/2305.04891v3
- Date: Tue, 5 Sep 2023 07:24:00 GMT
- Title: DELTA: Dynamic Embedding Learning with Truncated Conscious Attention for
CTR Prediction
- Authors: Chen Zhu, Liang Du, Hong Chen, Shuang Zhao, Zixun Sun, Xin Wang, Wenwu
Zhu
- Abstract summary: Click-Through Rate (CTR) prediction is a pivotal task in product and content recommendation.
We propose a model that enables Dynamic Embedding Learning with Truncated Conscious Attention for CTR prediction.
- Score: 61.68415731896613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Click-Through Rate (CTR) prediction is a pivotal task in product and content
recommendation, where learning effective feature embeddings is of great
significance. However, traditional methods typically learn fixed feature
representations without dynamically refining feature representations according
to the context information, leading to suboptimal performance. Some recent
approaches attempt to address this issue by learning bit-wise weights or
augmented embeddings for feature representations, but suffer from uninformative
or redundant features in the context. To tackle this problem, inspired by the
Global Workspace Theory in conscious processing, which posits that only a
specific subset of the product features are pertinent while the rest can be
noisy and even detrimental to human-click behaviors, we propose a CTR model
that enables Dynamic Embedding Learning with Truncated Conscious Attention for
CTR prediction, termed DELTA. DELTA contains two key components: (I) conscious
truncation module (CTM), which utilizes curriculum learning to apply adaptive
truncation on attention weights to select the most critical feature in the
context; (II) explicit embedding optimization (EEO), which applies an auxiliary
task during training that directly and independently propagates the gradient
from the loss layer to the embedding layer, thereby optimizing the embedding
explicitly via linear feature crossing. Extensive experiments on five
challenging CTR datasets demonstrate that DELTA achieves new state-of-art
performance among current CTR methods.
Related papers
- CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction [42.92011330807996]
$textitCTR-Sink$ is a novel framework introducing behavior-level attention sinks tailored for recommendation scenarios.<n>Inspired by attention sink theory, it constructs attention focus sinks and dynamically regulates attention aggregation via external information.
arXiv Detail & Related papers (2025-08-05T17:30:34Z) - CKAA: Cross-subspace Knowledge Alignment and Aggregation for Robust Continual Learning [80.18781219542016]
Continual Learning (CL) empowers AI models to continuously learn from sequential task streams.<n>Recent parameter-efficient fine-tuning (PEFT)-based CL methods have garnered increasing attention due to their superior performance.<n>We propose Cross-subspace Knowledge Alignment and Aggregation (CKAA) to enhance robustness against misleading task-ids.
arXiv Detail & Related papers (2025-07-13T03:11:35Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z) - GRU: Mitigating the Trade-off between Unlearning and Retention for Large Language Models [34.90826139012299]
Large language model (LLM) unlearning has demonstrated its essential role in removing privacy and copyright-related responses.
The pursuit of complete unlearning often comes with substantial costs due to its compromises in their general functionality.
We propose Gradient Rectified Unlearning (GRU), an enhanced unlearning framework controlling the updating gradients in a geometry-focused manner.
arXiv Detail & Related papers (2025-03-12T07:08:54Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - CTR-KAN: KAN for Adaptive High-Order Feature Interaction Modeling [37.80127625183842]
CTR-KAN is an adaptive framework for efficient high-order feature interaction modeling.
It builds upon the Kolmogorov-Arnold Network (KAN) paradigm, addressing its limitations in CTR prediction tasks.
CTR-KAN achieves state-of-the-art predictive accuracy with significantly lower computational costs.
arXiv Detail & Related papers (2024-08-16T12:51:52Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation [7.5856806269316825]
Weakly supervised semantic segmentation (WSSS) employing weak forms of labels has been actively studied to alleviate the annotation cost of acquiring pixel-level labels.
We propose shortcut mitigating augmentation (SMA) for WSSS, which generates synthetic representations of object-background combinations not seen in the training data to reduce the use of shortcut features.
arXiv Detail & Related papers (2024-05-28T13:07:35Z) - Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - Sequential Action-Induced Invariant Representation for Reinforcement
Learning [1.2046159151610263]
How to accurately learn task-relevant state representations from high-dimensional observations with visual distractions is a challenging problem in visual reinforcement learning.
We propose a Sequential Action-induced invariant Representation (SAR) method, in which the encoder is optimized by an auxiliary learner to only preserve the components that follow the control signals of sequential actions.
arXiv Detail & Related papers (2023-09-22T05:31:55Z) - TBIN: Modeling Long Textual Behavior Data for CTR Prediction [15.056265935931377]
Click-through rate (CTR) prediction plays a pivotal role in the success of recommendations.
Inspired by the recent thriving of language models (LMs), a surge of works improve prediction by organizing user behavior data in a textbftextual format.
While promising, these works have to truncate the textual data to reduce the quadratic computational overhead of self-attention in LMs.
In this paper, we propose a textbfTextual textbfBehavior-based textbfInterest Chunking textbfN
arXiv Detail & Related papers (2023-08-09T03:48:41Z) - MAP: A Model-agnostic Pretraining Framework for Click-through Rate
Prediction [39.48740397029264]
We propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data.
We derive two practical algorithms: masked feature prediction (RFD) and replaced feature detection (RFD)
arXiv Detail & Related papers (2023-08-03T12:55:55Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - CL4CTR: A Contrastive Learning Framework for CTR Prediction [14.968714571151509]
We introduce self-supervised learning to produce high-quality feature representations directly.
We propose a model-agnostic Contrastive Learning for CTR (CL4CTR) framework consisting of three self-supervised learning signals.
CL4CTR achieves the best performance on four datasets.
arXiv Detail & Related papers (2022-12-01T14:18:02Z) - Learning Deep Representations via Contrastive Learning for Instance
Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL)
In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.