Related papers: Non-Parametric Stochastic Sequential Assignment With Random Arrival Times

Non-Parametric Stochastic Sequential Assignment With Random Arrival Times

URL: http://arxiv.org/abs/2106.04944v1
Date: Wed, 9 Jun 2021 09:41:38 GMT
Title: Non-Parametric Stochastic Sequential Assignment With Random Arrival Times
Authors: Danial Dervovic, Parisa Hassanzadeh, Samuel Assefa, Prashant Reddy
Abstract summary: We consider a problem wherein jobs arrive at random times and assume random values. We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem. We prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large.
Score: 3.871148938060281
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a problem wherein jobs arrive at random times and assume random values. Upon each job arrival, the decision-maker must decide immediately whether or not to accept the job and gain the value on offer as a reward, with the constraint that they may only accept at most $n$ jobs over some reference time period. The decision-maker only has access to $M$ independent realisations of the job arrival process. We propose an algorithm, Non-Parametric Sequential Allocation (NPSA), for solving this problem. Moreover, we prove that the expected reward returned by the NPSA algorithm converges in probability to optimality as $M$ grows large. We demonstrate the effectiveness of the algorithm empirically on synthetic data and on public fraud-detection datasets, from where the motivation for this work is derived.

Related papers

Better Rates for Random Task Orderings in Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations. We develop novel last-iterate bounds in the realizable least squares setup, and apply them to derive new results for continual learning. We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic forgetting in sufficiently long task.
arXiv Detail & Related papers (2025-04-06T18:39:45Z)
Adversarially-Robust TD Learning with Markovian Data: Finite-Time Rates and Fundamental Limits [2.07180164747172]
Motivated by harsh, real-world environments, we revisit the policy evaluation problem from the perspective of adversarial robustness. We develop a novel algorithm called Robust-TD and prove that its finite-time guarantees match that of vanilla TD with linear function approximation up to a small $O(epsilon)$ term. To our knowledge, these results are the first of their kind in the context of adversarial approximation schemes driven by Markov noise.
arXiv Detail & Related papers (2025-02-07T05:05:42Z)
Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs) Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes. We propose an algorithm based on Thompson sampling with dynamically-sized episodes. We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z)
Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy. Many algorithms for IRL have an inherently nested structure. We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z)
Active Sampling of Multiple Sources for Sequential Estimation [92.37271004438406]
The objective is to design an active sampling algorithm for sequentially estimating parameters in order to form reliable estimates. This paper adopts emph conditional estimation cost functions, leading to a sequential estimation approach that was recently shown to render tractable analysis.
arXiv Detail & Related papers (2022-08-10T15:58:05Z)
Efficient Algorithms for Planning with Participation Constraints [74.74967476995572]
We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022] In this problem, a principal chooses actions in a decision process, resulting in separate utilities for the principal and the agent. We provide the first-time exact algorithm for this problem for finite-horizon settings, where previously only an additive $varepsilon$-approximation was known.
arXiv Detail & Related papers (2022-05-16T15:47:41Z)
Optimal Admission Control for Multiclass Queues with Time-Varying Arrival Rates via State Abstraction [16.99621896314678]
We consider a novel queuing problem where the decision-maker must choose to accept or reject randomly arriving tasks. The objective is to decide which tasks to accept so that the total price of tasks processed is maximised over a finite horizon. We show that the optimal value function has a specific structure, which enables us to solve the hybrid MDP exactly.
arXiv Detail & Related papers (2022-03-14T12:38:13Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Learning to Schedule [3.5408022972081685]
This paper proposes a learning and scheduling algorithm to minimize the expected cumulative holding cost incurred by jobs. In each time slot, the server can process a job while receiving the realized random holding costs of the jobs remaining in the system.
arXiv Detail & Related papers (2021-05-28T08:04:06Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.