Related papers: STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale

STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale

URL: http://arxiv.org/abs/2512.10149v2
Date: Sat, 13 Dec 2025 04:05:26 GMT
Title: STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale
Authors: Han Chen, Steven Zhu, Yingrui Li,
Abstract summary: We introduce STARS, a transformer-based sequential recommendation framework for large-scale, low-latency settings.<n>In offline evaluations, STARS improves Hit@5 by more than 75 percent relative to our existing Lambda system.<n>A large-scale A/B test on 6 million visits shows statistically significant lifts, including Total Orders +0.8%, Add-to-Cart on Home +2.0%, and Visits per User +0.5%.
Score: 9.860255576130214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world ecommerce recommender systems must deliver relevant items under strict tens-of-milliseconds latency constraints despite challenges such as cold-start products, rapidly shifting user intent, and dynamic context including seasonality, holidays, and promotions. We introduce STARS, a transformer-based sequential recommendation framework built for large-scale, low-latency ecommerce settings. STARS combines several innovations: dual-memory user embeddings that separate long-term preferences from short-term session intent; semantic item tokens that fuse pretrained text embeddings, learnable deltas, and LLM-derived attribute tags, strengthening content-based matching, long-tail coverage, and cold-start performance; context-aware scoring with learned calendar and event offsets; and a latency-conscious two-stage retrieval pipeline that performs offline embedding generation and online maximum inner-product search with filtering, enabling tens-of-milliseconds response times. In offline evaluations on production-scale data, STARS improves Hit@5 by more than 75 percent relative to our existing LambdaMART system. A large-scale A/B test on 6 million visits shows statistically significant lifts, including Total Orders +0.8%, Add-to-Cart on Home +2.0%, and Visits per User +0.5%. These results demonstrate that combining semantic enrichment, multi-intent modeling, and deployment-oriented design can yield state-of-the-art recommendation quality in real-world environments without sacrificing serving efficiency.

Related papers

GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z)
SARM: LLM-Augmented Semantic Anchor for End-to-End Live-Streaming Ranking [49.109782956280064]
Large-scale live-streaming recommendation requires precise modeling of non-stationary content semantics under real-time serving constraints.<n>We propose textbfSARM, an end-to-end ranking architecture that integrates natural-language semantic anchors directly into ranking optimization.<n>SARM is fully deployed and serves over 400 million users daily.
arXiv Detail & Related papers (2026-02-10T04:15:53Z)
OneMall: One Architecture, More Scenarios -- End-to-End Generative Recommender Family at Kuaishou E-Commerce [68.7552227901176]
OneMall is an end-to-end generative recommendation framework tailored for e-commerce services at Kuaishou.<n>It unifies the e-commerce's multiple item distribution scenarios, such as Product-card, short-video and live-streaming.<n>OneMall has been deployed, serving over 400 million daily active users at Kuaishou.
arXiv Detail & Related papers (2026-01-29T14:22:39Z)
PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework [15.34118278015945]
We propose a novel two-stage retrieval framework that enhances the personalization capabilities of item-to-item collaborative filtering (CF)<n>In the first Indexer Building Stage (IBS), we optimize the retrieval pool by relaxing truncation thresholds to maximize Hit Rate.<n>In the second Personalized Retrieval Stage (PRS), we introduce an interactive scoring model to overcome the limitations of inner product calculations.<n> offline experiments on large-scale real-world datasets demonstrate that PI2I outperforms traditional CF methods and rivals Two-Tower models.
arXiv Detail & Related papers (2026-01-23T15:10:39Z)
GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework [47.25361445845229]
Industrial-scale recommender systems rely on a cascade pipeline in which the retrieval stage must return a high-recall candidate set from billions of items under tight latency.<n>We present GRank, a novel structured-index-free retrieval paradigm that seamlessly unifies target-aware learning with user-centric retrieval.
arXiv Detail & Related papers (2025-10-17T04:15:09Z)
Improving E-commerce Search with Category-Aligned Retrieval [0.0]
Category-Aligned Retrieval System (CARS) improves search relevance by first predicting the product category from a user's query and then boosting products within that category.<n>We introduce a novel method for creating "Trainable Category Prototypes" from query embeddings.
arXiv Detail & Related papers (2025-09-03T20:43:52Z)
OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is the first industrial-deployed end-to-end generative framework for e-commerce search.<n>OneSearch reduces operational expenditure by 75.40% and improves Model FLOPs Utilization from 3.26% to 27.32%.<n>The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users.
arXiv Detail & Related papers (2025-09-03T11:50:04Z)
C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation [5.867032311769198]
We propose C-TLSAN (Content-Enhanced Time-Aware Long- and Short-Term Attention Network), an extension of the TLSAN architecture.<n>C-TLSAN enriches the recommendation pipeline by embedding textual content linked to users' historical interactions directly into both long-term and short-term attention layers.<n>We conduct extensive experiments on large-scale Amazon datasets, benchmarking C-TLSAN against state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T01:16:26Z)
Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z)
LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator [60.07198935747619]
We propose Twin-Tower Dynamic Semantic Recommender (T TDS), the first generative RS which adopts dynamic semantic index paradigm. To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender. The proposed T TDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
arXiv Detail & Related papers (2024-09-14T01:45:04Z)
Extreme Multi-label Learning for Semantic Matching in Product Search [41.66238191444171]
Given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more. We consider hierarchical linear models with n-gram features for fast real-time inference. Our method maintains a low latency of 1.25 milliseconds per query and achieves a 65% improvement of Recall@100.
arXiv Detail & Related papers (2021-06-23T21:16:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.