eSASRec: Enhancing Transformer-based Recommendations in a Modular Fashion
- URL: http://arxiv.org/abs/2508.06450v1
- Date: Fri, 08 Aug 2025 16:49:03 GMT
- Title: eSASRec: Enhancing Transformer-based Recommendations in a Modular Fashion
- Authors: Daria Tikhonovich, Nikita Zelinskiy, Aleksandr V. Petrov, Mayya Spirina, Andrei Semenov, Andrey V. Savchenko, Sergei Kuliev,
- Abstract summary: Transformer-based models, such as SASRec and BERT4Rec, have become common baselines for sequential recommendations.<n>We identify a very strong model that uses SASRec's training objective, LiGR Transformer layers, and Sampled Softmax Loss.<n>We find that common academic benchmarks show eSASRec to be 23% more effective compared to the most recent state-of-the-art models.
- Score: 45.793127165612745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since their introduction, Transformer-based models, such as SASRec and BERT4Rec, have become common baselines for sequential recommendations, surpassing earlier neural and non-neural methods. A number of following publications have shown that the effectiveness of these models can be improved by, for example, slightly updating the architecture of the Transformer layers, using better training objectives, and employing improved loss functions. However, the additivity of these modular improvements has not been systematically benchmarked - this is the gap we aim to close in this paper. Through our experiments, we identify a very strong model that uses SASRec's training objective, LiGR Transformer layers, and Sampled Softmax Loss. We call this combination eSASRec (Enhanced SASRec). While we primarily focus on realistic, production-like evaluation, in our preliminarily study we find that common academic benchmarks show eSASRec to be 23% more effective compared to the most recent state-of-the-art models, such as ActionPiece. In our main production-like benchmark, eSASRec resides on the Pareto frontier in terms of the accuracy-coverage tradeoff (alongside the recent industrial models HSTU and FuXi. As the modifications compared to the original SASRec are relatively straightforward and no extra features are needed (such as timestamps in HSTU), we believe that eSASRec can be easily integrated into existing recommendation pipelines and can can serve as a strong yet very simple baseline for emerging complicated algorithms. To facilitate this, we provide the open-source implementations for our models and benchmarks in repository https://github.com/blondered/transformer_benchmark
Related papers
- MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation [44.05859062614669]
MiniOneRec is the first fully open-source generative recommendation framework.<n>It provides an end-to-end workflow spanning SID construction, supervised fine-tuning, and recommendation-oriented reinforcement learning.<n>Our experiments reveal a consistent downward trend in both training and evaluation losses with increasing model size.
arXiv Detail & Related papers (2025-10-28T13:58:36Z) - Round-trip Reinforcement Learning: Self-Consistent Training for Better Chemical LLMs [51.29260537017623]
Large Language Models (LLMs) are emerging as versatile foundation models for computational chemistry.<n>These models often lack round-trip consistency.<n>We introduce Round-Trip Reinforcement Learning (RTRL), a novel framework that trains a model to improve its consistency.
arXiv Detail & Related papers (2025-10-01T23:58:58Z) - DenseRec: Revisiting Dense Content Embeddings for Sequential Transformer-based Recommendation [0.24999074238880484]
Transformer-based sequential recommenders typically rely solely on learned item ID embeddings.<n>DenseRec is a simple yet effective method that introduces a dual-path embedding approach.<n>In experiments on three real-world datasets, we find DenseRec to consistently outperform an ID-only SASRec baseline.
arXiv Detail & Related papers (2025-08-25T19:47:20Z) - Aligning Frozen LLMs by Reinforcement Learning: An Iterative Reweight-then-Optimize Approach [65.6966065843227]
Iterative Reweight-then-IRO is a framework that performs RL-style alignment of a frozen base model without touching its parameters.<n>At test time, the value functions are used to guide the base model generation via a search-based optimization process.<n> Notably, users can apply IRO to align a model on their own dataset, similar to OpenAI's reinforcement fine-tuning (RFT)
arXiv Detail & Related papers (2025-06-21T21:49:02Z) - gSASRec: Reducing Overconfidence in Sequential Recommendation Trained
with Negative Sampling [67.71952251641545]
We show that models trained with negative sampling tend to overestimate the probabilities of positive interactions.
We propose a novel Generalised Binary Cross-Entropy Loss function (gBCE) and theoretically prove that it can mitigate overconfidence.
We show through detailed experiments on three datasets that gSASRec does not exhibit the overconfidence problem.
arXiv Detail & Related papers (2023-08-14T14:56:40Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Improving Sequential Recommendation Models with an Enhanced Loss
Function [9.573139673704766]
We develop an improved loss function for sequential recommendation models.
We conduct experiments on two influential open-source libraries.
We reproduce the results of the BERT4Rec model on the Beauty dataset.
arXiv Detail & Related papers (2023-01-03T07:18:54Z) - Simple Recurrence Improves Masked Language Models [20.80840931168549]
Recurrence can indeed improve Transformer models by a consistent margin, without requiring low-level performance optimizations.
Our results confirm that recurrence can indeed improve Transformer models by a consistent margin, without requiring low-level performance optimizations.
arXiv Detail & Related papers (2022-05-23T19:38:23Z) - Measuring and Reducing Model Update Regression in Structured Prediction
for NLP [31.86240946966003]
backward compatibility requires that the new model does not regress on cases that were correctly handled by its predecessor.
This work studies model update regression in structured prediction tasks.
We propose a simple and effective method, Backward-Congruent Re-ranking (BCR), by taking into account the characteristics of structured output.
arXiv Detail & Related papers (2022-02-07T07:04:54Z) - SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval [11.38022203865326]
SPLADE model provides highly sparse representations and competitive results with respect to state-of-the-art dense and sparse approaches.
We modify the pooling mechanism, benchmark a model solely based on document expansion, and introduce models trained with distillation.
Overall, SPLADE is considerably improved with more than $9$% gains on NDCG@10 on TREC DL 2019, leading to state-of-the-art results on the BEIR benchmark.
arXiv Detail & Related papers (2021-09-21T10:43:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.