From Features to Transformers: Redefining Ranking for Scalable Impact
- URL: http://arxiv.org/abs/2502.03417v2
- Date: Wed, 21 May 2025 00:15:59 GMT
- Title: From Features to Transformers: Redefining Ranking for Scalable Impact
- Authors: Fedor Borisyuk, Lars Hertel, Ganesh Parameswaran, Gaurav Srivastava, Sudarshan Srinivasa Ramanujam, Borja Ocejo, Peng Du, Andrei Akterskii, Neil Daftary, Shao Tang, Daqi Sun, Qiang Charles Xiao, Deepesh Nathani, Mohit Kothari, Yun Dai, Guoyao Li, Aman Gupta,
- Abstract summary: LiGR is a large-scale ranking framework developed at LinkedIn.<n>We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items.
- Score: 6.656478617123893
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the deprecation of most manually designed feature engineering, outperforming the prior state-of-the-art system using only few features (compared to hundreds in the baseline), (2) validation of the scaling law for ranking systems, showing improved performance with larger models, more training data, and longer context sequences, and (3) simultaneous joint scoring of items in a set-wise manner, leading to automated improvements in diversity. To enable efficient serving of large ranking models, we describe techniques to scale inference effectively using single-pass processing of user history and set-wise attention. We also summarize key insights from various ablation studies and A/B tests, highlighting the most impactful technical approaches.
Related papers
- Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models [18.841505010078112]
We investigate the potential of a standard patch Transformer, demonstrating that it achieves state-of-the-art zero-shot forecasting performance.<n>We conduct a comprehensive ablation study that covers model scaling, data composition, and training techniques to isolate the essential ingredients for high performance.
arXiv Detail & Related papers (2026-02-06T18:01:44Z) - OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System [61.12400636463362]
OnePiece is a unified framework that seamlessly integrates LLM-style context engineering and reasoning into both retrieval and ranking models.<n>OnePiece has been deployed in the main personalized search scenario of Shopee and achieves consistent online gains across different key business metrics.
arXiv Detail & Related papers (2025-09-22T17:59:07Z) - Equip Pre-ranking with Target Attention by Residual Quantization [28.523618960969472]
TARQ is a novel pre-ranking framework for industrial recommendation systems.<n>Our model has been fully deployed in production, serving tens of millions of daily active users.
arXiv Detail & Related papers (2025-09-21T05:33:28Z) - KARE-RAG: Knowledge-Aware Refinement and Enhancement for RAG [63.82127103851471]
Retrieval-Augmented Generation (RAG) enables large language models to access broader knowledge sources.<n>We demonstrate that enhancing generative models' capacity to process noisy content is equally critical for robust performance.<n>We present KARE-RAG, which improves knowledge utilization through three key innovations.
arXiv Detail & Related papers (2025-06-03T06:31:17Z) - Simplifying Graph Transformers [64.50059165186701]
We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions.
Specifically, we advocate for the use of (1) simplified $L$ attention to measure the magnitude of closeness tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder.
arXiv Detail & Related papers (2025-04-17T02:06:50Z) - TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [66.80624029365448]
We propose a cross-architecture knowledge transfer paradigm, TransMamba, that facilitates the reuse of Transformer pre-trained knowledge.<n>We propose a two-stage framework to accelerate the training of Mamba-based models, ensuring their effectiveness across both uni-modal and multi-modal tasks.
arXiv Detail & Related papers (2025-02-21T01:22:01Z) - Scaling Sequential Recommendation Models with Transformers [0.0]
We take inspiration from the scaling laws observed in training large language models, and explore similar principles for sequential recommendation.<n> Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application.<n>We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains.
arXiv Detail & Related papers (2024-12-10T15:20:56Z) - STAR: Synthesis of Tailored Architectures [61.080157488857516]
We propose a new approach for the synthesis of tailored architectures (STAR)<n>Our approach combines a novel search space based on the theory of linear input-varying systems, supporting a hierarchical numerical encoding into architecture genomes. STAR genomes are automatically refined and recombined with gradient-free, evolutionary algorithms to optimize for multiple model quality and efficiency metrics.<n>Using STAR, we optimize large populations of new architectures, leveraging diverse computational units and interconnection patterns, improving over highly-optimized Transformers and striped hybrid models on the frontier of quality, parameter size, and inference cache for autoregressive language modeling.
arXiv Detail & Related papers (2024-11-26T18:42:42Z) - AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Hiformer: Heterogeneous Feature Interactions Learning with Transformers
for Recommender Systems [27.781785405875084]
We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions.
We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems.
arXiv Detail & Related papers (2023-11-10T05:57:57Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - Improving Sample Efficiency of Value Based Models Using Attention and
Vision Transformers [52.30336730712544]
We introduce a deep reinforcement learning architecture whose purpose is to increase sample efficiency without sacrificing performance.
We propose a visually attentive model that uses transformers to learn a self-attention mechanism on the feature maps of the state representation.
We demonstrate empirically that this architecture improves sample complexity for several Atari environments, while also achieving better performance in some of the games.
arXiv Detail & Related papers (2022-02-01T19:03:03Z) - Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual
Representations [9.6221436745451]
We describe how we generate a dataset with over a billion images via large weakly-supervised pretraining.
We leverage Transformers to replace the traditional convolutional backbone.
We show that large-scale Transformer-based pretraining provides significant benefits to industry computer vision applications.
arXiv Detail & Related papers (2021-08-12T17:58:56Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.