Related papers: Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors

Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors

URL: http://arxiv.org/abs/2403.19347v1
Date: Thu, 28 Mar 2024 12:05:15 GMT
Title: Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors
Authors: Binzong Geng, Zhaoxin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, Linjian Mo,
Abstract summary: Large language models (LLMs) are used to improve the performance of click-through rate (CTR) prediction. As user sequences grow longer, the current efficiency of LLMs is inadequate for training on billions of users and items. We propose Behavior Aggregated Hierarchical (BAHE) to enhance the efficiency of LLM-based CTR modeling.
Score: 25.086118164540974
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rise of large language models (LLMs), recent works have leveraged LLMs to improve the performance of click-through rate (CTR) prediction. However, we argue that a critical obstacle remains in deploying LLMs for practical use: the efficiency of LLMs when processing long textual user behaviors. As user sequences grow longer, the current efficiency of LLMs is inadequate for training on billions of users and items. To break through the efficiency barrier of LLMs, we propose Behavior Aggregated Hierarchical Encoding (BAHE) to enhance the efficiency of LLM-based CTR modeling. Specifically, BAHE proposes a novel hierarchical architecture that decouples the encoding of user behaviors from inter-behavior interactions. Firstly, to prevent computational redundancy from repeated encoding of identical user behaviors, BAHE employs the LLM's pre-trained shallow layers to extract embeddings of the most granular, atomic user behaviors from extensive user sequences and stores them in the offline database. Subsequently, the deeper, trainable layers of the LLM facilitate intricate inter-behavior interactions, thereby generating comprehensive user embeddings. This separation allows the learning of high-level user representations to be independent of low-level behavior encoding, significantly reducing computational complexity. Finally, these refined user embeddings, in conjunction with correspondingly processed item embeddings, are incorporated into the CTR model to compute the CTR scores. Extensive experimental results show that BAHE reduces training time and memory by five times for CTR models using LLMs, especially with longer user sequences. BAHE has been deployed in a real-world system, allowing for daily updates of 50 million CTR data on 8 A100 GPUs, making LLMs practical for industrial CTR prediction.

Related papers

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation [44.685176786857284]
We propose ReLLaX (Retrieval-enhanced Large Language models Plus), a framework offering optimization across data, prompt, and parameter levels. At the data level, we introduce Semantic User Behavior Retrieval (SUBR) to reduce sequence heterogeneity, making it easier for LLMs to extract key information. For prompt-level enhancement, we employ Soft Prompt Augmentation (SPA) to inject collaborative knowledge, aligning item representations with recommendation tasks. At the parameter level, we propose Component Fully-interactive LoRA (CFLoRA), which enhances LoRA's expressiveness by enabling interactions between its components
arXiv Detail & Related papers (2025-01-23T03:05:13Z)
LIBER: Lifelong User Behavior Modeling Based on Large Language Models [42.045535303737694]
We propose Lifelong User Behavior Modeling (LIBER) based on large language models. LIBER has been deployed on Huawei's music recommendation service and achieved substantial improvements in users' play count and play time by 3.01% and 7.69%.
arXiv Detail & Related papers (2024-11-22T03:43:41Z)
Beyond Inter-Item Relations: Dynamic Adaption for Enhancing LLM-Based Sequential Recommendation [83.87767101732351]
Sequential recommender systems (SRS) predict the next items that users may prefer based on user historical interaction sequences. Inspired by the rise of large language models (LLMs) in various AI applications, there is a surge of work on LLM-based SRS. We propose DARec, a sequential recommendation model built on top of coarse-grained adaption for capturing inter-item relations.
arXiv Detail & Related papers (2024-08-14T10:03:40Z)
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding [61.45448947483328]
We introduce Lossless Acceleration via Speculative Decoding for LLM-based Recommender Systems (LASER) LASER features a Customized Retrieval Pool to enhance retrieval efficiency and Relaxed Verification to improve the acceptance rate of draft tokens. LASER achieves a 3-5x speedup on public datasets and saves about 67% of computational resources during the online A/B test.
arXiv Detail & Related papers (2024-08-11T02:31:13Z)
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs [15.366324461797582]
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains. This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (1B parameters) LLMs.
arXiv Detail & Related papers (2024-06-28T17:16:03Z)
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models [66.24055500785657]
Traditional turn-based chat systems prevent users from verbally interacting with system while it is generating responses. To overcome these limitations, we adapt existing LLMs to listen users while generating output and provide users with instant feedback. We build a dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions.
arXiv Detail & Related papers (2024-06-22T03:20:10Z)
Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond [24.151927600694066]
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs. This paper conducts the first comprehensive experiment to investigate how far we have been in applying Large Language Models (LLMs) to generate high-quality commit messages.
arXiv Detail & Related papers (2024-04-23T08:24:43Z)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models. HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z)
CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z)
User-LLM: Efficient LLM Contextualization with User Embeddings [23.226164112909643]
User-LLM is a novel framework that leverages user embeddings to directly contextualize large language models with user history interactions. Our approach achieves significant efficiency gains by representing user timelines directly as embeddings, leading to substantial inference speedups of up to 78.1X.
arXiv Detail & Related papers (2024-02-21T08:03:27Z)
InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory [93.20588235940453]
In this paper, we introduce a training-free memory-based method, InfLLM. InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies.
arXiv Detail & Related papers (2024-02-07T06:50:42Z)
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline [22.08897444328099]
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. In this paper, we propose an efficient LLM inference pipeline that harnesses the power of LLMs.
arXiv Detail & Related papers (2023-05-22T15:36:06Z)
LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.