Related papers: LUMOS: Large User MOdels for User Behavior Prediction

LUMOS: Large User MOdels for User Behavior Prediction

URL: http://arxiv.org/abs/2512.08957v1
Date: Fri, 28 Nov 2025 10:56:08 GMT
Title: LUMOS: Large User MOdels for User Behavior Prediction
Authors: Dhruv Nigam,
Abstract summary: We present LUMOS, a transformer-based architecture that eliminates task-specific models and manual feature engineering.<n> LUMOS introduces a novel cross-attention mechanism that conditions predictions on future known events.<n>We demonstrate that LUMOS achieves superior performance compared to traditional task-specific models.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: User behavior prediction at scale remains a critical challenge for online B2C platforms. Traditional approaches rely heavily on task-specific models and domain-specific feature engineering. This is time-consuming, computationally expensive, and requires domain expertise and therefore not scalable. We present LUMOS (Large User MOdel Series), a transformer-based architecture that eliminates task-specific models and manual feature engineering by learning multiple tasks jointly using only raw user activity data. LUMOS introduces a novel cross-attention mechanism that conditions predictions on future known events (e.g., holidays, sales, etc.), enabling the model to predict complex behaviour patterns like "how will upcoming holidays affect user engagement?" The architecture also employs multi-modal tokenization, combining user transactions, event context, and static user demographic attributes into rich representations processed through specialized embedding pathways. Through extensive experiments on a production dataset spanning 275 billion user activity tokens from 250 million users, we demonstrate that LUMOS achieves superior performance compared to traditional task-specific models. Across 5 tasks with established baselines, we achieve an average improvement of 0.025 in ROC-AUC for binary classification tasks and 4.6\% reduction in MAPE for regression tasks. Online A/B testing validates these improvements translate to measurable business impact with a 3.15\% increase in Daily Active Users.

Related papers

Scalable Offline Model-Based RL with Action Chunks [60.80151356018376]
We study whether model-based reinforcement learning can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL.<n>We call this recipe textbfModel-Based RL with Action Chunks (MAC).<n>We show that MAC achieves the best performance among offline model-based RL algorithms, especially on challenging long-horizon tasks.
arXiv Detail & Related papers (2025-12-08T23:26:29Z)
CTR Prediction on Alibaba's Taobao Advertising Dataset Using Traditional and Deep Learning Models [14.51041016589099]
We explore how to model click-through rates more effectively using a large-scale Taobao dataset released by Alibaba.<n>To better model user intent, we combined behavioral data from hundreds of millions of interactions over a 22-day period.<n>Our research provides a roadmap for advancing click-through rate predictions and extending their value beyond e-commerce.
arXiv Detail & Related papers (2025-11-26T22:51:02Z)
Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning [56.129822832095726]
AdaMoE is a Mixture-of-Experts (MoE) architecture that inherits pretrained weights from dense VLA models.<n>A substantial 21.5% improvement in real-world experiments validates its practical effectiveness for robotic manipulation tasks.
arXiv Detail & Related papers (2025-10-16T04:52:57Z)
SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding [64.45047674586671]
We introduce the concept of an intention tree and propose a dataset curation pipeline.<n>We construct a sibling multimodal benchmark, SessionIntentBench, that evaluates L(V)LMs' capability on understanding inter-session intention shift.<n>With 1,952,177 intention entries, 1,132,145 session intention trajectories, and 13,003,664 available tasks mined using 10,905 sessions, we provide a scalable way to exploit the existing session data.
arXiv Detail & Related papers (2025-07-27T09:04:17Z)
Intention-Conditioned Flow Occupancy Models [80.42634994902858]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
BehaveGPT: A Foundation Model for Large-scale User Behavior Modeling [14.342911841456663]
We propose BehaveGPT, a foundational model designed specifically for large-scale user behavior prediction.<n>BehaveGPT is trained on vast user behavior datasets, allowing it to learn complex behavior patterns.<n>Our approach introduces the DRO-based pretraining paradigm tailored for user behavior data, which improves model generalization and transferability.
arXiv Detail & Related papers (2025-05-23T08:43:46Z)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [68.94373533768501]
We model knowledge retention, the capacity of a pre-trained language model to memorize factual information from its corpus, and introduce a principled method to estimate it prior to training.<n>We propose Size-dependent Mutual Information (SMI), an information-theoretic predictor that integrates knowledge frequency, knowledge specificity, and model size to forecast closed-book question answering (QA) accuracy.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
New User Event Prediction Through the Lens of Causal Inference [20.676353189313737]
We propose a novel discrete event prediction framework for new users with limited history.<n>We treat the user event history as the "treatment" for future events and the user category as the key confounder.<n>We demonstrate the improved performance of the proposed framework with a numerical simulation study and two real-world applications.
arXiv Detail & Related papers (2024-07-08T05:35:54Z)
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm. By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z)
Learning Large-scale Universal User Representation with Sparse Mixture of Experts [1.2722697496405464]
We propose SUPERMOE, a generic framework to obtain high quality user representation from multiple tasks. Specifically, the user behaviour sequences are encoded by MoE transformer, and we can thus increase the model capacity to billions of parameters. In order to deal with seesaw phenomenon when learning across multiple tasks, we design a new loss function with task indicators.
arXiv Detail & Related papers (2022-07-11T06:19:03Z)
Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction [23.460147230576855]
We propose a new modeling paradigm, which we name as Search-based Interest Model (SIM) SIM extracts user interests with two cascaded search units. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1% CTR and 4.4% lift.
arXiv Detail & Related papers (2020-06-10T03:41:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.