Sliding Window Training -- Utilizing Historical Recommender Systems Data for Foundation Models
- URL: http://arxiv.org/abs/2409.14517v1
- Date: Wed, 21 Aug 2024 18:59:52 GMT
- Title: Sliding Window Training -- Utilizing Historical Recommender Systems Data for Foundation Models
- Authors: Swanand Joshi, Yesu Feng, Ko-Jen Hsiao, Zhe Zhang, Sudarshan Lamkhede,
- Abstract summary: Long-lived recommender systems (RecSys) often encounter lengthy user-item interaction histories that span many years.
To effectively learn long term user preferences, Large RecSys foundation models (FM) need to encode this information in pretraining.
We introduce a sliding window training technique to incorporate long user history sequences during training time without increasing the model input dimension.
- Score: 8.298236989162213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-lived recommender systems (RecSys) often encounter lengthy user-item interaction histories that span many years. To effectively learn long term user preferences, Large RecSys foundation models (FM) need to encode this information in pretraining. Usually, this is done by either generating a long enough sequence length to take all history sequences as input at the cost of large model input dimension or by dropping some parts of the user history to accommodate model size and latency requirements on the production serving side. In this paper, we introduce a sliding window training technique to incorporate long user history sequences during training time without increasing the model input dimension. We show the quantitative & qualitative improvements this technique brings to the RecSys FM in learning user long term preferences. We additionally show that the average quality of items in the catalog learnt in pretraining also improves.
Related papers
- Document Reconstruction Unlocks Scalable Long-Context RLVR [60.74632963522131]
Reinforcement Learning with Verifiable Rewards(RLVR) has become a prominent paradigm to enhance the capabilities (i.e. long-context) of Large Language Models(LLMs)<n>We investigate unsupervised approaches to enhance the long-context capabilities of LLMs, eliminating the need for heavy human annotations or teacher models' supervision.<n>We validate the effectiveness of our method on two widely used benchmarks, RULER and LongBenchv2.
arXiv Detail & Related papers (2026-02-09T03:23:23Z) - Make It Long, Keep It Fast: End-to-End 10k-Sequence Modeling at Billion Scale on Douyin [21.0248704845397]
Short-video recommenders such as Douyin must exploit extremely long user histories without breaking latency or cost budgets.<n>We present an end-to-end system that scales long-length modeling to 10k histories in production histories.
arXiv Detail & Related papers (2025-11-08T17:22:54Z) - Discrete-event Tensor Factorization: Learning a Smooth Embedding for Continuous Domains [0.0]
This paper analyzes how time can be encoded in factorization-style recommendation models.<n>By including absolute time as a feature, our models can learn varying user preferences and changing item perception over time.
arXiv Detail & Related papers (2025-08-06T08:54:57Z) - Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z) - Scaling Sequential Recommendation Models with Transformers [0.0]
We take inspiration from the scaling laws observed in training large language models, and explore similar principles for sequential recommendation.
Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application.
We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains.
arXiv Detail & Related papers (2024-12-10T15:20:56Z) - How to Train Long-Context Language Models (Effectively) [75.5418485597276]
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to make effective use of long-context information.
ProLong-8B, which is from Llama-3 and trained on 40B tokens, demonstrates state-of-the-art long-context performance among similarly sized models at a length of 128K.
arXiv Detail & Related papers (2024-10-03T16:46:52Z) - Adaptive Memory Replay for Continual Learning [29.333341368722653]
Updating Foundation Models as new data becomes available can lead to catastrophic forgetting'
We introduce a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem.
We demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.
arXiv Detail & Related papers (2024-04-18T22:01:56Z) - User Embedding Model for Personalized Language Prompting [9.472634942498859]
We introduce a new User Embedding Module (UEM) that efficiently processes user history in free-form text by compressing and representing them as embeddings.
Our experiments demonstrate the superior capability of this approach in handling significantly longer histories.
The main contribution of this research is to demonstrate the ability to bias language models with user signals represented as embeddings.
arXiv Detail & Related papers (2024-01-10T00:35:52Z) - Long-range Multimodal Pretraining for Movie Understanding [79.63187251571391]
We introduce Long-range Multimodal Pretraining, a strategy, and a model that leverages movie data to train transferable multimodal and cross-modal encoders.
Our key idea is to learn from all modalities in a movie by observing and extracting relationships over a long-range.
Our model achieves state-of-the-art on several LVU tasks while being much more data efficient than previous works.
arXiv Detail & Related papers (2023-08-18T18:52:59Z) - LPT: Long-tailed Prompt Tuning for Image Classification [178.52948452353834]
We introduce several trainable prompts into a frozen pretrained model to adapt it to long-tailed data.
In phase 1, we train the shared prompt via supervised prompt tuning to adapt a pretrained model to the desired long-tailed domain.
In phase 2, we use the learnt shared prompt as query to select a small best matched set for a group of similar samples.
arXiv Detail & Related papers (2022-10-03T15:47:02Z) - Beyond Learning from Next Item: Sequential Recommendation via
Personalized Interest Sustainability [22.120680831015783]
Sequential recommender systems have shown effective suggestions by capturing users' interest drift.
The user-centric models capture personalized interest drift based on each user's sequential consumption history.
The item-centric models consider whether users' general interest sustains after the training time, but it is not personalized.
arXiv Detail & Related papers (2022-09-14T13:47:58Z) - Teacher Guided Training: An Efficient Framework for Knowledge Transfer [86.6784627427194]
We propose the teacher-guided training (TGT) framework for training a high-quality compact model.
TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain.
We find that TGT can improve accuracy on several image classification benchmarks and a range of text classification and retrieval tasks.
arXiv Detail & Related papers (2022-08-14T10:33:58Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Incremental Learning for Personalized Recommender Systems [8.020546404087922]
We present an incremental learning solution to provide both the training efficiency and the model quality.
The solution is deployed in LinkedIn and directly applicable to industrial scale recommender systems.
arXiv Detail & Related papers (2021-08-13T04:21:21Z) - Learning Transferrable Parameters for Long-tailed Sequential User
Behavior Modeling [70.64257515361972]
We argue that focusing on tail users could bring more benefits and address the long tails issue.
Specifically, we propose a gradient alignment and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail.
arXiv Detail & Related papers (2020-10-22T03:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.