Reinforcement Learning for Durable Algorithmic Recourse
- URL: http://arxiv.org/abs/2509.22102v1
- Date: Fri, 26 Sep 2025 09:24:12 GMT
- Title: Reinforcement Learning for Durable Algorithmic Recourse
- Authors: Marina Ceccon, Alessandro Fabris, Goran Radanović, Asia J. Biega, Gian Antonio Susto,
- Abstract summary: We present a time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations.<n>We also introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment.
- Score: 49.54997446851335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Algorithmic recourse seeks to provide individuals with actionable recommendations that increase their chances of receiving favorable outcomes from automated decision systems (e.g., loan approvals). While prior research has emphasized robustness to model updates, considerably less attention has been given to the temporal dynamics of recourse--particularly in competitive, resource-constrained settings where recommendations shape future applicant pools. In this work, we present a novel time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations. Additionally, we introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment to generate recommendations that are both feasible and valid. We design our recommendations to be durable, supporting validity over a predefined time horizon T. This durability allows individuals to confidently reapply after taking time to implement the suggested changes. Through extensive experiments in complex simulation environments, we show that our approach substantially outperforms existing baselines, offering a superior balance between feasibility and long-term validity. Together, these results underscore the importance of incorporating temporal and behavioral dynamics into the design of practical recourse systems.
Related papers
- Generative Actor Critic [74.04971271003869]
Generative Actor Critic (GAC) is a novel framework that decouples sequential decision-making by reframing textitpolicy evaluation as learning a generative model of the joint distribution over trajectories and returns.<n>Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-12-25T06:31:11Z) - OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment [55.59322229889159]
We propose OmniQuality-R, a unified reward modeling framework that transforms multi-task quality reasoning into continuous and interpretable reward signals.<n>We use a reasoning-enhanced reward modeling dataset to form a reliable chain-of-thought dataset for supervised fine-tuning.<n>We evaluate OmniQuality-R on three key IQA tasks: aesthetic quality assessment, technical quality evaluation, and text-image alignment.
arXiv Detail & Related papers (2025-10-12T13:46:28Z) - Adaptive Reinforcement Learning for Dynamic Configuration Allocation in Pre-Production Testing [4.370892281528124]
We introduce a novel reinforcement learning framework that recasts configuration allocation as a sequential decision-making problem.<n>Our method is the first to integrate Q-learning with a hybrid reward design that fuses simulated outcomes and real-time feedback.
arXiv Detail & Related papers (2025-10-02T05:12:28Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - Towards Human-like Preference Profiling in Sequential Recommendation [42.100841285901474]
RecPO is a preference optimization framework for sequential recommendation.<n>It exploits adaptive reward margins based on inferred preference hierarchies and temporal signals.<n>It mirrors key characteristics of human decision-making: favoring timely satisfaction, maintaining coherent preferences, and exercising discernment under shifting contexts.
arXiv Detail & Related papers (2025-06-02T21:09:29Z) - Value Function Decomposition in Markov Recommendation Process [19.082512423102855]
We propose an online reinforcement learning framework to improve recommender performance.<n>We show that these two factors can be separately approximated by decomposing the original temporal difference loss.<n>The disentangled learning framework can achieve a more accurate estimation with faster learning and improved robustness against action exploration.
arXiv Detail & Related papers (2025-01-29T04:22:29Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.<n>Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.<n>Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems [79.76519917171261]
This paper addresses the computational overhead and resource inefficiency prevalent in Sequential Recommender Systems (SRSs)<n>We introduce an innovative approach combining pruning methods with advanced model designs.<n>Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec)
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - AURO: Reinforcement Learning for Adaptive User Retention Optimization in Recommender Systems [25.18963930580529]
Reinforcement Learning (RL) has garnered increasing attention for its ability to optimize user retention in recommender systems.<n>This paper introduces a novel approach called textbfAdaptive textbfUser textbfRetention textbfOptimization (AURO) to address this challenge.
arXiv Detail & Related papers (2023-10-06T02:45:21Z) - Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective [11.31980071390936]
We present a novel podcast recommender system deployed at industrial scale.
In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests.
arXiv Detail & Related papers (2023-02-07T16:17:25Z) - D2RLIR : an improved and diversified ranking function in interactive
recommendation systems based on deep reinforcement learning [0.3058685580689604]
This paper proposes a deep reinforcement learning based recommendation system by utilizing Actor-Critic architecture.
The proposed model is able to generate a diverse while relevance recommendation list based on the user's preferences.
arXiv Detail & Related papers (2021-10-28T13:11:29Z) - Recommendation Fairness: From Static to Dynamic [12.080824433982993]
We discuss how fairness could be baked into reinforcement learning techniques for recommendation.
We argue that in order to make further progress in recommendation fairness, we may want to consider multi-agent (game-theoretic) optimization, multi-objective (Pareto) optimization.
arXiv Detail & Related papers (2021-09-05T21:38:05Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.