Related papers: MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems

MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems

URL: http://arxiv.org/abs/2512.24325v1
Date: Tue, 30 Dec 2025 16:27:41 GMT
Title: MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems
Authors: Wan Jiang, Xinyi Zang, Yudong Zhao, Yusi Zou, Yunfei Lu, Junbo Tong, Yang Liu, Ming Li, Jiani Shi, Xin Yang,
Abstract summary: We propose MaRCA, a reinforcement learning framework for end-to-end computation resource allocation in recommender systems.<n>MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.
Score: 11.011695215804629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencies, thus limiting global optimality. In this paper, we propose MaRCA, a multi-agent reinforcement learning framework for end-to-end computation resource allocation in large-scale recommender systems. MaRCA models the stages of a recommender system as cooperative agents, using Centralized Training with Decentralized Execution (CTDE) to optimize revenue under computation resource constraints. We introduce an AutoBucket TestBench for accurate computation cost estimation, and a Model Predictive Control (MPC)-based Revenue-Cost Balancer to proactively forecast traffic loads and adjust the revenue-cost trade-off accordingly. Since its end-to-end deployment in the advertising pipeline of a leading global e-commerce platform in November 2024, MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.

Related papers

ODAR: Principled Adaptive Routing for LLM Reasoning via Active Inference [60.958331943869126]
ODAR-Expert is an adaptive routing framework that optimize the accuracy-efficiency trade-off via principled resource allocation.<n>We show strong and consistent gains, including 98.2% accuracy on MATH and 54.8% on Humanity's Last Exam.
arXiv Detail & Related papers (2026-02-27T05:22:01Z)
MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era [74.42509044145417]
MegaFlow is a large-scale distributed orchestration system that enables efficient scheduling, resource allocation, and fine-grained task management for agent-environment workloads.<n>In our agent training deployments, MegaFlow successfully orchestrates tens of thousands of concurrent agent tasks while maintaining high system stability and achieving efficient resource utilization.
arXiv Detail & Related papers (2026-01-12T13:25:33Z)
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective [1.2515675707300356]
We propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning.<n>We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm.
arXiv Detail & Related papers (2025-10-11T00:29:55Z)
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading [57.28635022507172]
TiMi is a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment.<n>We propose a two-tier analytical paradigm from macro patterns to micro customization, layered programming design for trading bot implementation, and closed-loop optimization driven by mathematical reflection.
arXiv Detail & Related papers (2025-10-06T13:08:55Z)
Fair Resource Allocation for Fleet Intelligence [6.70517744733229]
We open-sourced Fair-Synergy, an algorithmic framework to ensure fair resource allocation across fleet intelligence.<n>We evaluate Fair-Synergy with advanced vision and language models such as BERT, VGG16, MobileNet, and ResNets on datasets including MNIST, CIFAR-10, CIFAR-100, BDD, and GLUE.<n>We demonstrate that Fair-Synergy outperforms standard benchmarks by up to 25% in multi-agent inference and 11% in multi-agent learning settings.
arXiv Detail & Related papers (2025-09-02T03:20:41Z)
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility [52.08428597962423]
Large Language Models improve with increasing amounts of high-quality training data.<n>We find token-counts outperform manual and learned mixes, indicating that simple approaches for dataset size and diversity are surprisingly effective.<n>We propose two complementary approaches: UtiliMax, which extends token-based $200s by incorporating utility estimates from reduced-scale ablations, achieving up to a 10.6x speedup over manual baselines; and Model Estimated Data Utility (MEDU), which leverages LLMs to estimate data utility from small samples, matching ablation-based performance while reducing computational requirements by $simx.
arXiv Detail & Related papers (2025-01-20T21:10:22Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes. We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks. Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z)
Computation Resource Allocation Solution in Recommender Systems [19.456109814747048]
We propose a computation resource allocation solution (CRAS) that maximizes the business goal with limited computation resources and response time. The effectiveness of our method is verified by extensive experiments based on the real dataset from Taobao.com.
arXiv Detail & Related papers (2021-03-03T08:41:43Z)
Multi-Agent Deep Reinforcement Learning enabled Computation Resource Allocation in a Vehicular Cloud Network [30.736512922808362]
We investigate the computational resource allocation problem in a distributed Ad-Hoc vehicular network with no centralized infrastructure support. To overcome the dilemma of lacking a real central control unit in VCN, the allocation is completed on the vehicles in a distributed manner.
arXiv Detail & Related papers (2020-08-14T17:02:24Z)
Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems [49.80033982995667]
This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system. A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP. We derive a novel deep reinforcement learning (RL) scheme that adopts two separate double deep Q-networks for each MU to approximate the Q-factor and the post-decision Q-factor.
arXiv Detail & Related papers (2020-07-15T21:32:43Z)
Distributed Resource Scheduling for Large-Scale MEC Systems: A Multi-Agent Ensemble Deep Reinforcement Learning with Imitation Acceleration [44.40722828581203]
We propose a distributed intelligent resource scheduling (DIRS) framework, which includes centralized training relying on the global information and distributed decision making by each agent deployed in each MEC server. We first introduce a novel multi-agent ensemble-assisted distributed deep reinforcement learning (DRL) architecture, which can simplify the overall neural network structure of each agent. Secondly, we apply action refinement to enhance the exploration ability of the proposed DIRS framework, where the near-optimal state-action pairs are obtained by a novel L'evy flight search.
arXiv Detail & Related papers (2020-05-21T20:04:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.