BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model
with Non-textual Features for CTR Prediction
- URL: http://arxiv.org/abs/2308.11527v1
- Date: Thu, 17 Aug 2023 08:25:54 GMT
- Title: BERT4CTR: An Efficient Framework to Combine Pre-trained Language Model
with Non-textual Features for CTR Prediction
- Authors: Dong Wang, Kav\'e Salamatian, Yunqing Xia, Weiwei Deng, Qi Zhiang
- Abstract summary: We propose a novel framework BERT4CTR, with the Uni-Attention mechanism that can benefit from the interactions between non-textual and textual features.
BERT4CTR can outperform significantly the state-of-the-art frameworks to handle multi-modal inputs and be applicable to Click-Through-Rate (CTR) prediction.
- Score: 12.850529317775198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although deep pre-trained language models have shown promising benefit in a
large set of industrial scenarios, including Click-Through-Rate (CTR)
prediction, how to integrate pre-trained language models that handle only
textual signals into a prediction pipeline with non-textual features is
challenging.
Up to now two directions have been explored to integrate multi-modal inputs
in fine-tuning of pre-trained language models. One consists of fusing the
outcome of language models and non-textual features through an aggregation
layer, resulting into ensemble framework, where the cross-information between
textual and non-textual inputs are only learned in the aggregation layer. The
second one consists of splitting non-textual features into fine-grained
fragments and transforming the fragments to new tokens combined with textual
ones, so that they can be fed directly to transformer layers in language
models. However, this approach increases the complexity of the learning and
inference because of the numerous additional tokens.
To address these limitations, we propose in this work a novel framework
BERT4CTR, with the Uni-Attention mechanism that can benefit from the
interactions between non-textual and textual features while maintaining low
time-costs in training and inference through a dimensionality reduction.
Comprehensive experiments on both public and commercial data demonstrate that
BERT4CTR can outperform significantly the state-of-the-art frameworks to handle
multi-modal inputs and be applicable to CTR prediction.
Related papers
- FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction [45.15127775876369]
Click-through rate (CTR) prediction has become increasingly indispensable for various Internet applications.
Traditional CTR models convert the multi-field categorical data into ID features via one-hot encoding, and extract the collaborative signals among features.
We propose a novel model-agnostic framework (i.e., ClickPrompt) where we incorporate CTR models to generate interaction-aware soft prompts.
arXiv Detail & Related papers (2023-10-13T16:37:53Z) - Progressive Class Semantic Matching for Semi-supervised Text
Classification [26.794533973357403]
We investigate the marriage between semi-supervised learning and a pre-trained language model.
By means of extensive experiments, we show that our method can bring remarkable improvement to baselines.
arXiv Detail & Related papers (2022-05-20T13:59:03Z) - Leveraging Advantages of Interactive and Non-Interactive Models for
Vector-Based Cross-Lingual Information Retrieval [12.514666775853598]
We propose a novel framework to leverage the advantages of interactive and non-interactive models.
We introduce semi-interactive mechanism, which builds our model upon non-interactive architecture but encodes each document together with its associated multilingual queries.
Our methods significantly boost the retrieval accuracy while maintaining the computational efficiency.
arXiv Detail & Related papers (2021-11-03T03:03:19Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Syntax-Enhanced Pre-trained Model [49.1659635460369]
We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.
Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages.
We present a model that utilizes the syntax of text in both pre-training and fine-tuning stages.
arXiv Detail & Related papers (2020-12-28T06:48:04Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.