Related papers: Maximizing Success Rate of Payment Routing using Non-stationary Bandits

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

URL: http://arxiv.org/abs/2308.01028v2
Date: Fri, 6 Oct 2023 07:53:37 GMT
Title: Maximizing Success Rate of Payment Routing using Non-stationary Bandits
Authors: Aayush Chaudhary, Abhinav Rai, Abhishek Gupta
Abstract summary: We propose a Ray-based implementation for optimally scaling bandit-based payment routing to over 10,000 transactions per second. We conducted live experiments on the payment transaction system on a fantasy sports platform Dream11. Our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92% compared to the traditional rule-based methods over one month.
Score: 5.781861264333114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper discusses the system architecture design and deployment of non-stationary multi-armed bandit approaches to determine a near-optimal payment routing policy based on the recent history of transactions. We propose a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10,000 transactions per second, adhering to the system design requirements and ecosystem constraints with Payment Card Industry Data Security Standard (PCI DSS). We first evaluate the effectiveness of multiple bandit-based payment routing algorithms on a custom simulator to benchmark multiple non-stationary bandit approaches and identify the best hyperparameters. We then conducted live experiments on the payment transaction system on a fantasy sports platform Dream11. In the live experiments, we demonstrated that our non-stationary bandit-based algorithm consistently improves the success rate of transactions by 0.92% compared to the traditional rule-based methods over one month.

Related papers

A Control-Theoretic Approach to Dynamic Payment Routing for Success Rate Optimization [0.34376560669160394]
This paper introduces a control-theoretic framework for dynamic payment routing.<n>It is implemented within JUSPAY's Payment Orchestrator to maximize transaction success rate.<n>Results show an improvement of up to 1.15% in success rate over traditional rule-based routing.
arXiv Detail & Related papers (2025-10-19T07:27:37Z)
Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model. We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z)
Partially Observable Contextual Bandits with Linear Payoffs [18.593061465167363]
We consider a new bandit setting with partially observable, correlated contexts and linear payoffs. We propose an algorithmic pipeline named EMKF-Bandit, which integrates system identification, filtering, and classic contextual bandit algorithms.
arXiv Detail & Related papers (2024-09-17T19:47:04Z)
Towards Evaluating Transfer-based Attacks Systematically, Practically, and Fairly [79.07074710460012]
adversarial vulnerability of deep neural networks (DNNs) has drawn great attention. An increasing number of transfer-based methods have been developed to fool black-box DNN models. We establish a transfer-based attack benchmark (TA-Bench) which implements 30+ methods.
arXiv Detail & Related papers (2023-11-02T15:35:58Z)
Rate-Optimal Policy Optimization for Linear Markov Decision Processes [65.5958446762678]
We obtain rate-optimal $widetilde O (sqrt K)$ regret where $K$ denotes the number of episodes. Our work is the first to establish the optimal (w.r.t.$K$) rate of convergence in the setting with bandit feedback. No algorithm with an optimal rate guarantee is currently known.
arXiv Detail & Related papers (2023-08-28T15:16:09Z)
Congested Bandits: Optimal Routing via Short-term Resets [30.892724364965]
We study the problem of Congested Bandits where each arm's reward is allowed to depend on the number of times it was played in the past. We propose a UCB style algorithm and show that its policy regret scales as $tildeO(sqrtK Delta T)$. For the linear contextual bandit setup, our algorithm, based on an iterative least squares planner, achieves policy regret $tildeO(sqrtdT + Delta)$.
arXiv Detail & Related papers (2023-01-23T03:11:06Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
Matching Pursuit Based Scheduling for Over-the-Air Federated Learning [67.59503935237676]
This paper develops a class of low-complexity device scheduling algorithms for over-the-air learning via the method of federated learning. Compared to the state-of-the-art proposed scheme, the proposed scheme poses a drastically lower efficiency system. The efficiency of the proposed scheme is confirmed via experiments on the CIFAR dataset.
arXiv Detail & Related papers (2022-06-14T08:14:14Z)
TTRS: Tinkoff Transactions Recommender System benchmark [62.997667081978825]
We present the TTRS - Tinkoff Transactions Recommender System benchmark. This financial transaction benchmark contains over 2 million interactions between almost 10,000 users and more than 1,000 merchant brands over 14 months. We also present a comprehensive comparison of the current popular RecSys methods on the next-period recommendation task and conduct a detailed analysis of their performance against various metrics and recommendation goals.
arXiv Detail & Related papers (2021-10-11T20:04:07Z)
Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits [16.042075861624056]
We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods. The new policies achieve this with low time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.
arXiv Detail & Related papers (2020-10-08T16:17:53Z)
Learning from eXtreme Bandit Feedback [105.0383130431503]
We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces. In this paper, we introduce a selective importance sampling estimator (sIS) that operates in a significantly more favorable bias-variance regime. We employ this estimator in a novel algorithmic procedure -- named Policy Optimization for eXtreme Models (POXM) -- for learning from bandit feedback on XMC tasks.
arXiv Detail & Related papers (2020-09-27T20:47:25Z)
Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. Previous works ignore the losing auctions to alleviate the difficulty with censored states. We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.