The Bidding Games: Reinforcement Learning for MEV Extraction on Polygon Blockchain
- URL: http://arxiv.org/abs/2510.14642v1
- Date: Thu, 16 Oct 2025 12:54:53 GMT
- Title: The Bidding Games: Reinforcement Learning for MEV Extraction on Polygon Blockchain
- Authors: Andrei Seoev, Leonid Gremyachikh, Anastasiia Smirnova, Yash Madhwal, Alisa Kalacheva, Dmitry Belousov, Ilia Zubov, Aleksei Smirnov, Denis Fedyanin, Vladimir Gorgadze, Yury Yanovich,
- Abstract summary: We present a reinforcement learning framework for MEV extraction on Polygon Atlas.<n>Our work establishes that reinforcement learning provides a critical advantage in high-frequency MEV environments.
- Score: 0.11880231424287215
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In blockchain networks, the strategic ordering of transactions within blocks has emerged as a significant source of profit extraction, known as Maximal Extractable Value (MEV). The transition from spam-based Priority Gas Auctions to structured auction mechanisms like Polygon Atlas has transformed MEV extraction from public bidding wars into sealed-bid competitions under extreme time constraints. While this shift reduces network congestion, it introduces complex strategic challenges where searchers must make optimal bidding decisions within a sub-second window without knowledge of competitor behavior or presence. Traditional game-theoretic approaches struggle in this high-frequency, partially observable environment due to their reliance on complete information and static equilibrium assumptions. We present a reinforcement learning framework for MEV extraction on Polygon Atlas and make three contributions: (1) A novel simulation environment that accurately models the stochastic arrival of arbitrage opportunities and probabilistic competition in Atlas auctions; (2) A PPO-based bidding agent optimized for real-time constraints, capable of adaptive strategy formulation in continuous action spaces while maintaining production-ready inference speeds; (3) Empirical validation demonstrating our history-conditioned agent captures 49\% of available profits when deployed alongside existing searchers and 81\% when replacing the market leader, significantly outperforming static bidding strategies. Our work establishes that reinforcement learning provides a critical advantage in high-frequency MEV environments where traditional optimization methods fail, offering immediate value for industrial participants and protocol designers alike.
Related papers
- Competitive Multi-Operator Reinforcement Learning for Joint Pricing and Fleet Rebalancing in AMoD Systems [6.547090882667874]
We investigate the impact of competition on policy learning by introducing a multi-operator reinforcement learning framework.<n>Experiments using real-world data from multiple cities demonstrate that competition fundamentally alters learned behaviors, leading to lower prices and distinct fleet positioning patterns.
arXiv Detail & Related papers (2026-03-05T09:44:24Z) - Large Language Models as Bidding Agents in Repeated HetNet Auction [4.305340565419997]
This paper investigates the integration of large language models (LLMs) as reasoning agents in repeated spectrum auctions within heterogeneous networks (HetNets)<n>We propose a distributed auction-based framework in which each base station (BS) independently conducts its own multi-channel auction, and user equipments (UEs) strategically decide both their association and bid values.<n> Simulation results reveal that the LLM-empowered UE achieves consistently higher channel access frequency and improved budget efficiency compared to benchmarks.
arXiv Detail & Related papers (2026-03-02T07:30:01Z) - MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization [56.074760766965085]
Group-Relative Policy Optimization has emerged as an efficient paradigm for aligning Large Language Models (LLMs)<n>We propose MAESTRO, which treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck.<n>We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal.
arXiv Detail & Related papers (2026-01-12T05:02:48Z) - HOB: A Holistically Optimized Bidding Strategy under Heterogeneous Auction Mechanisms with Organic Traffic [23.230940625345372]
E-commerce advertising platforms typically sell commercial traffic through either second-price auction (SPA) or first-price auction (FPA)<n>For automated bidding systems, such a trend poses a critical challenge: determining optimal strategies across heterogeneous auction channels to fulfill diverse advertiser objectives.<n>We derive an efficient solution for optimal bidding under FPA channels, which takes into account the presence of organic traffic - traffic can be won for free.
arXiv Detail & Related papers (2025-10-17T02:00:09Z) - Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading [57.28635022507172]
TiMi is a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment.<n>We propose a two-tier analytical paradigm from macro patterns to micro customization, layered programming design for trading bot implementation, and closed-loop optimization driven by mathematical reflection.
arXiv Detail & Related papers (2025-10-06T13:08:55Z) - Agentic Reinforcement Learning with Implicit Step Rewards [92.26560379363492]
Large language models (LLMs) are increasingly developed as autonomous agents using reinforcement learning (agentic RL)<n>We introduce implicit step rewards for agentic RL (iStar), a general credit-assignment strategy that integrates seamlessly with standard RL algorithms.<n>We evaluate our method on three challenging agent benchmarks, including WebShop and VisualSokoban, as well as open-ended social interactions with unverifiable rewards in SOTOPIA.
arXiv Detail & Related papers (2025-09-23T16:15:42Z) - Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints [24.842944692980495]
We study the dynamic allocation of a reusable resource to strategic agents with private valuations.<n>We develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior.
arXiv Detail & Related papers (2025-07-13T03:18:02Z) - Dynamic Reinsurance Treaty Bidding via Multi-Agent Reinforcement Learning [0.0]
This paper develops a novel multi-agent reinforcement learning (MARL) framework for reinsurance treaty bidding.<n>MARL agents achieve up to 15% higher underwriting profit, 20% lower tail risk, and over 25% improvement in Sharpe ratios.<n>These findings suggest that MARL offers a viable path toward more transparent, adaptive, and risk-sensitive reinsurance markets.
arXiv Detail & Related papers (2025-06-16T05:43:22Z) - Deep Reinforcement Learning-Based Bidding Strategies for Prosumers Trading in Double Auction-Based Transactive Energy Market [10.071307102216371]
A community-based double auction market is considered a promising TEM that can encourage prosumers to participate and maximize social welfare.<n>In this study, we propose a double auction-based TEM with multiple DERs-equipped prosumers to transparently and efficiently manage energy transactions.<n>We also propose a deep reinforcement learning (DRL) model with distributed learning and execution to ensure the scalability and privacy of the market environment.
arXiv Detail & Related papers (2025-02-16T21:38:21Z) - Learning to Bid in Non-Stationary Repeated First-Price Auctions [27.743710782882136]
First-price auctions have gained significant traction in digital advertising markets.<n> determining an optimal bidding strategy in first-price auctions is more complex.<n>We provide a minimax-optimal characterization of the dynamic regret for the class of sequences of opponents' highest bids.
arXiv Detail & Related papers (2025-01-23T03:53:27Z) - CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition [52.2034494666179]
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width.
We propose a competition mechanism to address this fundamental challenge of representation collapse.
By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator.
arXiv Detail & Related papers (2024-02-04T15:17:09Z) - Mimicking Better by Matching the Approximate Action Distribution [48.95048003354255]
We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
arXiv Detail & Related papers (2023-06-16T12:43:47Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.