Related papers: Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization

URL: http://arxiv.org/abs/2507.04396v1
Date: Sun, 06 Jul 2025 13:56:02 GMT
Title: Inverse Reinforcement Learning using Revealed Preferences and Passive Stochastic Optimization
Authors: Vikram Krishnamurthy,
Abstract summary: The first two chapters view inverse reinforcement learning (IRL) through the lens of revealed preferences from microeconomics.<n>The third chapter studies adaptive gradient algorithms.
Score: 15.878313629774269
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This monograph, spanning three chapters, explores Inverse Reinforcement Learning (IRL). The first two chapters view inverse reinforcement learning (IRL) through the lens of revealed preferences from microeconomics while the third chapter studies adaptive IRL via Langevin dynamics stochastic gradient algorithms. Chapter uses classical revealed preference theory (Afriat's theorem and extensions) to identify constrained utility maximizers based on observed agent actions. This allows for the reconstruction of set-valued estimates of an agent's utility. We illustrate this procedure by identifying the presence of a cognitive radar and reconstructing its utility function. The chapter also addresses the construction of a statistical detector for utility maximization behavior when agent actions are corrupted by noise. Chapter 2 studies Bayesian IRL. It investigates how an analyst can determine if an observed agent is a rationally inattentive Bayesian utility maximizer (i.e., simultaneously optimizing its utility and observation likelihood). The chapter discusses inverse stopping-time problems, focusing on reconstructing the continuation and stopping costs of a Bayesian agent operating over a random horizon. We then apply this IRL methodology to identify the presence of a Bayes-optimal sequential detector. Additionally, Chapter 2 provides a concise overview of discrete choice models, inverse Bayesian filtering, and inverse stochastic gradient algorithms for adaptive IRL. Finally, Chapter 3 introduces an adaptive IRL approach utilizing passive Langevin dynamics. This method aims to track time-varying utility functions given noisy and misspecified gradients. In essence, the adaptive IRL algorithms presented in Chapter 3 can be conceptualized as inverse stochastic gradient algorithms, as they learn the utility function in real-time while a stochastic gradient algorithm is in operation.

Related papers

Slow Feature Analysis on Markov Chains from Goal-Directed Behavior [0.0]
This work investigates the effects of goal-directed behavior on value-function approximation in an idealized setting.<n>Three correction routes, which can potentially alleviate detrimental scaling effects, are evaluated and discussed.
arXiv Detail & Related papers (2025-06-01T19:57:41Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Understanding Optimization in Deep Learning with Central Flows [53.66160508990508]
We show that an RMS's implicit behavior can be explicitly captured by a "central flow:" a differential equation. We show that these flows can empirically predict long-term optimization trajectories of generic neural networks.
arXiv Detail & Related papers (2024-10-31T17:58:13Z)
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages [37.12048108122337]
This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning. It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm.
arXiv Detail & Related papers (2023-06-02T11:37:22Z)
Finite-Sample Bounds for Adaptive Inverse Reinforcement Learning using Passive Langevin Dynamics [13.440621354486906]
This paper provides a finite-sample analysis of a passive gradient Langevin dynamics (PSGLD) algorithm.<n> Adaptive IRL aims to estimate the cost function of a forward learner performing a gradient algorithm.
arXiv Detail & Related papers (2023-04-18T16:39:51Z)
Adaptive LASSO estimation for functional hidden dynamic geostatistical model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD) The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Gleo-Det: Deep Convolution Feature-Guided Detector with Local Entropy Optimization for Salient Points [5.955667705173262]
We propose to achieve fine constraint based on the requirement of repeatability while coarse constraint with guidance of deep convolution features. With the guidance of convolution features, we define the cost function from both positive and negative sides.
arXiv Detail & Related papers (2022-04-27T12:40:21Z)
Sensing Cox Processes via Posterior Sampling and Positive Bases [56.82162768921196]
We study adaptive sensing of point processes, a widely used model from spatial statistics. We model the intensity function as a sample from a truncated Gaussian process, represented in a specially constructed positive basis. Our adaptive sensing algorithms use Langevin dynamics and are based on posterior sampling (textscCox-Thompson) and top-two posterior sampling (textscTop2) principles.
arXiv Detail & Related papers (2021-10-21T14:47:06Z)
Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms [21.796874356469644]
Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response. We present a generalized Langevin dynamics to estimate the reward function $R(theta)$. The proposed IRL algorithms use kernel-based passive learning schemes and generate samples from the distribution proportional to $exp(R(theta)$.
arXiv Detail & Related papers (2020-06-20T23:12:11Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.