STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale
- URL: http://arxiv.org/abs/2512.10149v2
- Date: Sat, 13 Dec 2025 04:05:26 GMT
- Title: STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale
- Authors: Han Chen, Steven Zhu, Yingrui Li,
- Abstract summary: We introduce STARS, a transformer-based sequential recommendation framework for large-scale, low-latency settings.<n>In offline evaluations, STARS improves Hit@5 by more than 75 percent relative to our existing Lambda system.<n>A large-scale A/B test on 6 million visits shows statistically significant lifts, including Total Orders +0.8%, Add-to-Cart on Home +2.0%, and Visits per User +0.5%.
- Score: 9.860255576130214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world ecommerce recommender systems must deliver relevant items under strict tens-of-milliseconds latency constraints despite challenges such as cold-start products, rapidly shifting user intent, and dynamic context including seasonality, holidays, and promotions. We introduce STARS, a transformer-based sequential recommendation framework built for large-scale, low-latency ecommerce settings. STARS combines several innovations: dual-memory user embeddings that separate long-term preferences from short-term session intent; semantic item tokens that fuse pretrained text embeddings, learnable deltas, and LLM-derived attribute tags, strengthening content-based matching, long-tail coverage, and cold-start performance; context-aware scoring with learned calendar and event offsets; and a latency-conscious two-stage retrieval pipeline that performs offline embedding generation and online maximum inner-product search with filtering, enabling tens-of-milliseconds response times. In offline evaluations on production-scale data, STARS improves Hit@5 by more than 75 percent relative to our existing LambdaMART system. A large-scale A/B test on 6 million visits shows statistically significant lifts, including Total Orders +0.8%, Add-to-Cart on Home +2.0%, and Visits per User +0.5%. These results demonstrate that combining semantic enrichment, multi-intent modeling, and deployment-oriented design can yield state-of-the-art recommendation quality in real-world environments without sacrificing serving efficiency.
Related papers
- GEMs: Breaking the Long-Sequence Barrier in Generative Recommendation with a Multi-Stream Decoder [54.64137490632567]
We propose a novel and unified framework designed to capture users' sequences from long-term history.<n>Generative Multi-streamers ( GEMs) break user sequences into three streams.<n>Extensive experiments on large-scale industrial datasets demonstrate that GEMs significantly outperforms state-the-art methods in recommendation accuracy.
arXiv Detail & Related papers (2026-02-14T06:42:56Z) - SARM: LLM-Augmented Semantic Anchor for End-to-End Live-Streaming Ranking [49.109782956280064]
Large-scale live-streaming recommendation requires precise modeling of non-stationary content semantics under real-time serving constraints.<n>We propose textbfSARM, an end-to-end ranking architecture that integrates natural-language semantic anchors directly into ranking optimization.<n>SARM is fully deployed and serves over 400 million users daily.
arXiv Detail & Related papers (2026-02-10T04:15:53Z) - OneMall: One Architecture, More Scenarios -- End-to-End Generative Recommender Family at Kuaishou E-Commerce [68.7552227901176]
OneMall is an end-to-end generative recommendation framework tailored for e-commerce services at Kuaishou.<n>It unifies the e-commerce's multiple item distribution scenarios, such as Product-card, short-video and live-streaming.<n>OneMall has been deployed, serving over 400 million daily active users at Kuaishou.
arXiv Detail & Related papers (2026-01-29T14:22:39Z) - PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework [15.34118278015945]
We propose a novel two-stage retrieval framework that enhances the personalization capabilities of item-to-item collaborative filtering (CF)<n>In the first Indexer Building Stage (IBS), we optimize the retrieval pool by relaxing truncation thresholds to maximize Hit Rate.<n>In the second Personalized Retrieval Stage (PRS), we introduce an interactive scoring model to overcome the limitations of inner product calculations.<n> offline experiments on large-scale real-world datasets demonstrate that PI2I outperforms traditional CF methods and rivals Two-Tower models.
arXiv Detail & Related papers (2026-01-23T15:10:39Z) - GRank: Towards Target-Aware and Streamlined Industrial Retrieval with a Generate-Rank Framework [47.25361445845229]
Industrial-scale recommender systems rely on a cascade pipeline in which the retrieval stage must return a high-recall candidate set from billions of items under tight latency.<n>We present GRank, a novel structured-index-free retrieval paradigm that seamlessly unifies target-aware learning with user-centric retrieval.
arXiv Detail & Related papers (2025-10-17T04:15:09Z) - Improving E-commerce Search with Category-Aligned Retrieval [0.0]
Category-Aligned Retrieval System (CARS) improves search relevance by first predicting the product category from a user's query and then boosting products within that category.<n>We introduce a novel method for creating "Trainable Category Prototypes" from query embeddings.
arXiv Detail & Related papers (2025-09-03T20:43:52Z) - OneSearch: A Preliminary Exploration of the Unified End-to-End Generative Framework for E-commerce Search [43.94443394870866]
OneSearch is the first industrial-deployed end-to-end generative framework for e-commerce search.<n>OneSearch reduces operational expenditure by 75.40% and improves Model FLOPs Utilization from 3.26% to 27.32%.<n>The system has been successfully deployed across multiple search scenarios in Kuaishou, serving millions of users.
arXiv Detail & Related papers (2025-09-03T11:50:04Z) - C-TLSAN: Content-Enhanced Time-Aware Long- and Short-Term Attention Network for Personalized Recommendation [5.867032311769198]
We propose C-TLSAN (Content-Enhanced Time-Aware Long- and Short-Term Attention Network), an extension of the TLSAN architecture.<n>C-TLSAN enriches the recommendation pipeline by embedding textual content linked to users' historical interactions directly into both long-term and short-term attention layers.<n>We conduct extensive experiments on large-scale Amazon datasets, benchmarking C-TLSAN against state-of-the-art baselines.
arXiv Detail & Related papers (2025-06-16T01:16:26Z) - Multi-agents based User Values Mining for Recommendation [52.26100802380767]
We propose a zero-shot multi-LLM collaborative framework for effective and accurate user value extraction.<n>We apply text summarization techniques to condense item content while preserving essential meaning.<n>To mitigate hallucinations, we introduce two specialized agent roles: evaluators and supervisors.
arXiv Detail & Related papers (2025-05-02T04:01:31Z) - LLM-based Bi-level Multi-interest Learning Framework for Sequential Recommendation [54.396000434574454]
We propose a novel multi-interest SR framework combining implicit behavioral and explicit semantic perspectives.<n>It includes two modules: the Implicit Behavioral Interest Module and the Explicit Semantic Interest Module.<n>Experiments on four real-world datasets validate the framework's effectiveness and practicality.
arXiv Detail & Related papers (2024-11-14T13:00:23Z) - Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models.<n>A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes.<n>We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z) - Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator [60.07198935747619]
We propose Twin-Tower Dynamic Semantic Recommender (T TDS), the first generative RS which adopts dynamic semantic index paradigm.
To be more specific, we for the first time contrive a dynamic knowledge fusion framework which integrates a twin-tower semantic token generator into the LLM-based recommender.
The proposed T TDS recommender achieves an average improvement of 19.41% in Hit-Rate and 20.84% in NDCG metric, compared with the leading baseline methods.
arXiv Detail & Related papers (2024-09-14T01:45:04Z) - Extreme Multi-label Learning for Semantic Matching in Product Search [41.66238191444171]
Given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more.
We consider hierarchical linear models with n-gram features for fast real-time inference.
Our method maintains a low latency of 1.25 milliseconds per query and achieves a 65% improvement of Recall@100.
arXiv Detail & Related papers (2021-06-23T21:16:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.