MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders
- URL: http://arxiv.org/abs/2602.14110v1
- Date: Sun, 15 Feb 2026 11:53:30 GMT
- Title: MixFormer: Co-Scaling Up Dense and Sequence in Industrial Recommenders
- Authors: Xu Huang, Hao Zhang, Zhifang Fan, Yunwen Huang, Zhuoxing Wei, Zheng Chai, Jinan Ni, Yuchao Zheng, Qiwei Chen,
- Abstract summary: MixFormer is a unified Transformer-style architecture tailored for recommender systems.<n>It jointly models sequential behaviors and feature interactions within a single backbone.<n>Experiments on large-scale industrial datasets demonstrate that MixFormer consistently exhibits superior accuracy and efficiency.
- Score: 11.566232697512879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As industrial recommender systems enter a scaling-driven regime, Transformer architectures have become increasingly attractive for scaling models towards larger capacity and longer sequence. However, existing Transformer-based recommendation models remain structurally fragmented, where sequence modeling and feature interaction are implemented as separate modules with independent parameterization. Such designs introduce a fundamental co-scaling challenge, as model capacity must be suboptimally allocated between dense feature interaction and sequence modeling under a limited computational budget. In this work, we propose MixFormer, a unified Transformer-style architecture tailored for recommender systems, which jointly models sequential behaviors and feature interactions within a single backbone. Through a unified parameterization, MixFormer enables effective co-scaling across both dense capacity and sequence length, mitigating the trade-off observed in decoupled designs. Moreover, the integrated architecture facilitates deep interaction between sequential and non-sequential representations, allowing high-order feature semantics to directly inform sequence aggregation and enhancing overall expressiveness. To ensure industrial practicality, we further introduce a user-item decoupling strategy for efficiency optimizations that significantly reduce redundant computation and inference latency. Extensive experiments on large-scale industrial datasets demonstrate that MixFormer consistently exhibits superior accuracy and efficiency. Furthermore, large-scale online A/B tests on two production recommender systems, Douyin and Douyin Lite, show consistent improvements in user engagement metrics, including active days and in-app usage duration.
Related papers
- Compress, Cross and Scale: Multi-Level Compression Cross Networks for Efficient Scaling in Recommender Systems [5.897678894426804]
MLCC is a structured feature interaction architecture that organizes feature crosses through hierarchical compression and dynamic composition.<n>MC-MLCC is a Multi-Channel extension that decomposes feature interactions into parallel subspaces.<n>Our proposed models consistently outperform strong DLRM-style baselines by up to 0.52 AUC, while reducing model parameters and FLOPs by up to 26$times$ under comparable performance.
arXiv Detail & Related papers (2026-02-12T15:06:46Z) - HyFormer: Revisiting the Roles of Sequence Modeling and Feature Interaction in CTR Prediction [8.97787361529607]
This paper presents HyFormer, a unified hybrid transformer architecture that tightly integrates long-sequence modeling and feature interaction into a single backbone.<n>Experiments on billion-scale industrial datasets demonstrate that HyFormer consistently outperforms strong LONGER and RankMixer baselines.
arXiv Detail & Related papers (2026-01-19T02:55:05Z) - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z) - HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction [10.779868699316829]
We propose HoMer, a Homogeneous-Oriented TransforMer for modeling sequential and set-wise contexts.<n>HoMer scales up and outperforms our industrial baseline by 0.0099 in the AUC metric, and enhances online business metrics like CTR/RPM by 1.99%/2.46%.
arXiv Detail & Related papers (2025-10-13T07:47:03Z) - Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts [10.204413386807564]
We propose a Dynamic Adaptive Shared Expert and Grouped Multi-Head Attention Hybrid Model (DASG-MoE) to enhance long-sequence modeling capabilities.<n>First, we employ the Grouped Multi-Head Attention (GMHA) mechanism to effectively reduce the computational complexity of long sequences.<n>Second, we design a Dual-Scale Shared Expert Structure (DSSE), where shallow experts use lightweight computations to quickly respond to low-dimensional features.<n>Third, we propose a hierarchical Adaptive Dynamic Routing (ADR) mechanism that dynamically selects expert levels based on feature complexity and task requirements.
arXiv Detail & Related papers (2025-09-05T02:49:15Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution [88.20464308588889]
We propose a Structural Similarity-Inspired Unfolding (SSIU) method for efficient image SR.<n>This method is designed through unfolding an SR optimization function constrained by structural similarity.<n>Our model outperforms current state-of-the-art models, boasting lower parameter counts and reduced memory consumption.
arXiv Detail & Related papers (2025-06-13T14:29:40Z) - LARES: Latent Reasoning for Sequential Recommendation [96.26996622771593]
We present LARES, a novel and scalable LAtent REasoning framework for Sequential recommendation.<n>Our proposed approach employs a recurrent architecture that allows flexible expansion of reasoning depth without increasing parameter complexity.<n>Our framework exhibits seamless compatibility with existing advanced models, further improving their recommendation performance.
arXiv Detail & Related papers (2025-05-22T16:22:54Z) - A Novel Mamba-based Sequential Recommendation Method [4.941272356564765]
Sequential recommendation (SR) encodes user activity to predict the next action.<n> Transformer-based models have proven effective for sequential recommendation, but the complexity of the self-attention module in Transformers scales quadratically with the sequence length.<n>We propose a novel multi-head latent Mamba architecture, which employs multiple low-dimensional Mamba layers and fully connected layers.
arXiv Detail & Related papers (2025-04-10T02:43:19Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.<n>Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.<n>We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.<n>We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.<n>Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.