Related papers: From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

URL: http://arxiv.org/abs/2511.12081v1
Date: Sat, 15 Nov 2025 07:55:50 GMT
Title: From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
Authors: Bencheng Yan, Yuejie Lei, Zhiyuan Zeng, Di Wang, Kaiyi Lin, Pengjie Wang, Jian Xu, Bo Zheng,
Abstract summary: Deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns.<n>We identify the root cause as a structural misalignment.<n>We introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention.
Score: 14.997545091069894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality, while CTR data demand combinatorial reasoning over high-cardinality semantic fields. Unstructured attention spreads capacity indiscriminately, amplifying noise under extreme sparsity and breaking scalable learning. To restore alignment, we introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation. This design ensures model complexity scales with the number of fields F, not the total vocabulary size n >> F, leading to tighter generalization and, critically, observed power-law scaling in AUC as model width increases. We present the first formal scaling law for CTR models, grounded in Rademacher complexity, that explains and predicts this behavior. On large-scale benchmarks, FAT improves AUC by up to +0.51% over state-of-the-art methods. Deployed online, it delivers +2.33% CTR and +0.66% RPM. Our work establishes that effective scaling in recommendation arises not from size, but from structured expressivity-architectural coherence with data semantics.

Related papers

StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z)
EST: Towards Efficient Scaling Laws in Click-Through Rate Prediction via Unified Modeling [13.693397814262681]
Efficiently scaling industrial Click-Through Rate (CTR) prediction has recently attracted significant research attention.<n>We propose the Efficiently Scalable Transformer (EST), which achieves fully unified modeling by processing all raw inputs in a single sequence without lossy aggregation.<n>EST significantly outperforms production baselines, delivering a 3.27% RPM (Revenue Per Mile) increase and a 1.22% CTR lift.
arXiv Detail & Related papers (2026-02-11T12:51:54Z)
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders [74.72147962028265]
Representation Autoencoders (RAEs) have shown distinct advantages in diffusion modeling on ImageNet.<n>We investigate whether this framework can scale to large-scale, freeform text-to-image (T2I) generation.
arXiv Detail & Related papers (2026-01-22T18:58:16Z)
Knowledge-Informed Neural Network for Complex-Valued SAR Image Recognition [51.03674130115878]
We introduce the Knowledge-Informed Neural Network (KINN), a lightweight framework built upon a novel "compression-aggregation-compression" architecture.<n>KINN establishes a state-of-the-art in parameter-efficient recognition, offering exceptional generalization in data-scarce and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-10-23T07:12:26Z)
FaCTR: Factorized Channel-Temporal Representation Transformers for Efficient Time Series Forecasting [0.0]
Time series data is characterized by low per-timestep information density and complex dependencies across channels.<n>We propose a lightweight Transformer with an explicitly structural design FaCTR.<n>Despite its compact design, FaCTR state-the-art performance on eleven public forecasting benchmarks.
arXiv Detail & Related papers (2025-06-05T21:17:53Z)
DLF: Enhancing Explicit-Implicit Interaction via Dynamic Low-Order-Aware Fusion for CTR Prediction [71.41414150295702]
We propose a novel framework, Dynamic Low-Order-Aware Fusion (DLF), for modeling click-through rate (CTR) prediction.<n>RLI preserves low-order signals while mitigating redundancy from residual connections, and NAF dynamically integrates explicit and implicit representations at each layer, enhancing information sharing.<n>Experiments on public datasets demonstrate that DLF achieves state-of-the-art performance in CTR prediction, addressing key limitations of existing models.
arXiv Detail & Related papers (2025-05-25T15:05:00Z)
Scaled Supervision is an Implicit Lipschitz Regularizer [32.41225209639384]
In social media, recommender systems rely on the click-through rate (CTR) as the standard metric to evaluate user engagement.<n>We show that scaling supervision bandwidth can act as an implicit Lipschitz regularizer, stably optimizing existing CTR models to achieve better generalizability.
arXiv Detail & Related papers (2025-03-19T01:01:28Z)
Tensor Product Attention Is All You Need [61.3442269053374]
Product Attention (TPA) is a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly.<n>TPA achieves improved model quality alongside memory efficiency.<n>Based on TPA, we introduce the ProducT ATTion Transformer (T6), a new model architecture for sequence modeling.
arXiv Detail & Related papers (2025-01-11T03:37:10Z)
Beyond KAN: Introducing KarSein for Adaptive High-Order Feature Interaction Modeling in CTR Prediction [14.65809648476631]
This study introduces the Kolmogorov-Arnold Represented Sparse Interaction Network (KarSein)<n>KarSein adaptively transforms low-order basic features into high-order feature interactions.<n>It achieves exceptional predictive accuracy while maintaining a highly compact parameter size and minimal computational overhead.
arXiv Detail & Related papers (2024-08-16T12:51:52Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
TSRFormer: Table Structure Recognition with Transformers [15.708108572696064]
We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images. We propose a new two-stage DETR based separator prediction approach, dubbed textbfSeparator textbfREgression textbfTRansformer (SepRETR) We achieve state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW.
arXiv Detail & Related papers (2022-08-09T17:36:13Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
TCCT: Tightly-Coupled Convolutional Transformer on Time Series Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures. Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.