Related papers: Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation

Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation

URL: http://arxiv.org/abs/2212.02620v1
Date: Mon, 5 Dec 2022 22:10:13 GMT
Title: Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation
Authors: Soysal Degirmenci, Chris Jones
Abstract summary: We propose a system that considers both financial losses of fraud and long-term customer satisfaction. We show that offline RL methods outperform traditional binary classification solutions in SimStore.
Score: 0.571097144710995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Amazon and other e-commerce sites must employ mechanisms to protect their millions of customers from fraud, such as unauthorized use of credit cards. One such mechanism is order fraud evaluation, where systems evaluate orders for fraud risk, and either "pass" the order, or take an action to mitigate high risk. Order fraud evaluation systems typically use binary classification models that distinguish fraudulent and legitimate orders, to assess risk and take action. We seek to devise a system that considers both financial losses of fraud and long-term customer satisfaction, which may be impaired when incorrect actions are applied to legitimate customers. We propose that taking actions to optimize long-term impact can be formulated as a Reinforcement Learning (RL) problem. Standard RL methods require online interaction with an environment to learn, but this is not desirable in high-stakes applications like order fraud evaluation. Offline RL algorithms learn from logged data collected from the environment, without the need for online interaction, making them suitable for our use case. We show that offline RL methods outperform traditional binary classification solutions in SimStore, a simplified e-commerce simulation that incorporates order fraud risk. We also propose a novel approach to training offline RL policies that adds a new loss term during training, to better align policy exploration with taking correct actions.

Related papers

Unsupervised Detection of Fraudulent Transactions in E-commerce Using Contrastive Learning [9.199789653471269]
E-commerce platforms are facing an increasing number of fraud threats. Traditional fraud detection methods rely on supervised learning, which requires large amounts of labeled data. This study proposes an unsupervised e-commerce fraud detection algorithm based on SimCLR.
arXiv Detail & Related papers (2025-03-24T16:14:16Z)
Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments. Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies. Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline. We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe. In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z)
Transaction Fraud Detection via an Adaptive Graph Neural Network [64.9428588496749]
We propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection. A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes. Experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.
arXiv Detail & Related papers (2023-07-11T07:48:39Z)
Towards Generalizable Reinforcement Learning for Trade Execution [25.199192981742744]
Reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data. We find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment. We propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner.
arXiv Detail & Related papers (2023-05-12T02:41:11Z)
Application of Deep Reinforcement Learning to Payment Fraud [0.0]
A typical fraud detection system employs standard supervised learning methods where the focus is on maximizing the fraud recall rate. We argue that such a formulation can lead to suboptimal solutions. We formulate fraud detection as a sequential decision-making problem by including the utility within the model in the form of the reward function.
arXiv Detail & Related papers (2021-12-08T11:30:53Z)
Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return. On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z)
Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy. Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data. We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z)
Adaptive Stress Testing for Adversarial Learning in a Financial Environment [0.0]
We develop a model for credit card fraud detection based on historical payment transaction data. We apply the reinforcement learning model known as Adaptive Stress Testing to train an agent to find the most likely path to system failure.
arXiv Detail & Related papers (2021-07-08T03:19:40Z)
OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way. Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy. We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z)
Deep Q-Network-based Adaptive Alert Threshold Selection Policy for Payment Fraud Systems in Retail Banking [9.13755431537592]
We propose an enhanced threshold selection policy for fraud alert systems. The proposed approach formulates the threshold selection as a sequential decision making problem and uses Deep Q-Network based reinforcement learning. Experimental results show that this adaptive approach outperforms the current static solutions by reducing the fraud losses as well as improving the operational efficiency of the alert system.
arXiv Detail & Related papers (2020-10-21T15:10:57Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
ARMS: Automated rules management system for fraud detection [1.7499351967216341]
We address online fraud detection, which consists of classifying incoming transactions as either legitimate or fraudulent in real-time. Modern fraud detection systems consist of a machine learning model and rules defined by human experts. We propose ARMS, an automated rules management system that evaluates the contribution of individual rules and optimize the set of active rules using search and a user-defined loss-function.
arXiv Detail & Related papers (2020-02-14T15:29:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.