Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce
Order Fraud Evaluation
- URL: http://arxiv.org/abs/2212.02620v1
- Date: Mon, 5 Dec 2022 22:10:13 GMT
- Title: Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce
Order Fraud Evaluation
- Authors: Soysal Degirmenci, Chris Jones
- Abstract summary: We propose a system that considers both financial losses of fraud and long-term customer satisfaction.
We show that offline RL methods outperform traditional binary classification solutions in SimStore.
- Score: 0.571097144710995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Amazon and other e-commerce sites must employ mechanisms to protect their
millions of customers from fraud, such as unauthorized use of credit cards. One
such mechanism is order fraud evaluation, where systems evaluate orders for
fraud risk, and either "pass" the order, or take an action to mitigate high
risk. Order fraud evaluation systems typically use binary classification models
that distinguish fraudulent and legitimate orders, to assess risk and take
action. We seek to devise a system that considers both financial losses of
fraud and long-term customer satisfaction, which may be impaired when incorrect
actions are applied to legitimate customers. We propose that taking actions to
optimize long-term impact can be formulated as a Reinforcement Learning (RL)
problem. Standard RL methods require online interaction with an environment to
learn, but this is not desirable in high-stakes applications like order fraud
evaluation. Offline RL algorithms learn from logged data collected from the
environment, without the need for online interaction, making them suitable for
our use case. We show that offline RL methods outperform traditional binary
classification solutions in SimStore, a simplified e-commerce simulation that
incorporates order fraud risk. We also propose a novel approach to training
offline RL policies that adds a new loss term during training, to better align
policy exploration with taking correct actions.
Related papers
- Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.
Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.
Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.
We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z) - Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe.
In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop.
We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z) - Transaction Fraud Detection via an Adaptive Graph Neural Network [64.9428588496749]
We propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection.
A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes.
Experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.
arXiv Detail & Related papers (2023-07-11T07:48:39Z) - Towards Generalizable Reinforcement Learning for Trade Execution [25.199192981742744]
Reinforcement learning (RL) has been applied to optimized trade execution to learn smarter policies from market data.
We find that many existing RL methods exhibit considerable overfitting which prevents them from real deployment.
We propose to learn compact representations for context to address the overfitting problem, either by leveraging prior knowledge or in an end-to-end manner.
arXiv Detail & Related papers (2023-05-12T02:41:11Z) - Application of Deep Reinforcement Learning to Payment Fraud [0.0]
A typical fraud detection system employs standard supervised learning methods where the focus is on maximizing the fraud recall rate.
We argue that such a formulation can lead to suboptimal solutions.
We formulate fraud detection as a sequential decision-making problem by including the utility within the model in the form of the reward function.
arXiv Detail & Related papers (2021-12-08T11:30:53Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Adaptive Stress Testing for Adversarial Learning in a Financial
Environment [0.0]
We develop a model for credit card fraud detection based on historical payment transaction data.
We apply the reinforcement learning model known as Adaptive Stress Testing to train an agent to find the most likely path to system failure.
arXiv Detail & Related papers (2021-07-08T03:19:40Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Deep Q-Network-based Adaptive Alert Threshold Selection Policy for
Payment Fraud Systems in Retail Banking [9.13755431537592]
We propose an enhanced threshold selection policy for fraud alert systems.
The proposed approach formulates the threshold selection as a sequential decision making problem and uses Deep Q-Network based reinforcement learning.
Experimental results show that this adaptive approach outperforms the current static solutions by reducing the fraud losses as well as improving the operational efficiency of the alert system.
arXiv Detail & Related papers (2020-10-21T15:10:57Z) - ARMS: Automated rules management system for fraud detection [1.7499351967216341]
We address online fraud detection, which consists of classifying incoming transactions as either legitimate or fraudulent in real-time.
Modern fraud detection systems consist of a machine learning model and rules defined by human experts.
We propose ARMS, an automated rules management system that evaluates the contribution of individual rules and optimize the set of active rules using search and a user-defined loss-function.
arXiv Detail & Related papers (2020-02-14T15:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.