Scheduling with Uncertain Holding Costs and its Application to Content Moderation
- URL: http://arxiv.org/abs/2505.21331v1
- Date: Tue, 27 May 2025 15:26:24 GMT
- Title: Scheduling with Uncertain Holding Costs and its Application to Content Moderation
- Authors: Caner Gocmen, Thodoris Lykouris, Deeksha Sinha, Wentao Weng,
- Abstract summary: In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown.<n>We consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs.<n>By viewing each job as a Markovian ski-rental problem, we develop an index-based algorithm that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves.
- Score: 4.2130745016804205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain holding costs, the two canonical algorithmic principles, instantaneous-cost ($c\mu$-rule) and expected-remaining-cost ($c\mu/\theta$-rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the regret of OaRC scales as $\tilde{O}(L^{1.5}\sqrt{N})$, where $L$ is the maximum length of a job's holding cost trajectory and $N$ is the system size. This regret bound shows that OaRC achieves asymptotic optimality when the system size $N$ scales to infinity. Moreover, its regret is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.
Related papers
- Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence [10.189658648290257]
We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces.<n>We propose the novel Q-Measure-Learning, which learns a signed empirical measure supported on visited state-action pairs.
arXiv Detail & Related papers (2026-03-03T21:15:22Z) - The Headless Firm: How AI Reshapes Enterprise Boundaries [5.856292656853395]
We argue that agentic AI induces a structural change in how coordination costs scale.<n>We formalize this claim as a coordination cost model with two falsifiable empirical predictions.
arXiv Detail & Related papers (2026-02-24T22:13:14Z) - $V_0$: A Generalist Value Model for Any Policy at State Zero [80.7505802128501]
Policy methods rely on a baseline to measure the relative advantage of an action.<n>This baseline is typically estimated by a Value Model (Critic) often as large as the policy model itself.<n>We propose a Generalist Value Model capable of estimating the expected performance of any model on unseen prompts.
arXiv Detail & Related papers (2026-02-03T14:35:23Z) - EvoRoute: Experience-Driven Self-Routing LLM Agent Systems [100.64399490164959]
EvoRoute is a self-evolving model routing paradigm that transcends static, pre-defined model assignments.<n> Experiments on challenging agentic benchmarks demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80%$ and latency by over $70%$.
arXiv Detail & Related papers (2026-01-06T04:06:46Z) - SGD with Dependent Data: Optimal Estimation, Regret, and Inference [3.038061705362137]
gradient descent (SGD) is shown to accommodate both independent and dependent information under a broad class of stepsize schedules and exploration rate schemes.<n>We show that SGD simultaneously achieves statistically optimal estimation error and regret, extending and improving existing results.<n>For online sparse regression, we develop a new SGD-based algorithm that uses only $d$ units of storage and requires $O(d)$ flops per iteration.
arXiv Detail & Related papers (2026-01-04T04:52:11Z) - The Alignment Bottleneck [0.0]
We model the loop as a two-stage cascade $U to H to Y$ given $S$, with cognitive capacity $C_textcog|S$ and average total capacity $barC_texttot|S$.<n>It pairs a data size-independent Fano lower bound proved on a separable codebook mixture with a PAC-Bayes upper bound whose KL term is controlled by the same channel via $m, barC_texttot|S$.
arXiv Detail & Related papers (2025-09-19T12:38:30Z) - No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need! [56.80767500991973]
We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback.<n>It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time.<n>We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget
arXiv Detail & Related papers (2025-06-16T08:42:31Z) - Learning While Repositioning in On-Demand Vehicle Sharing Networks [4.724825031148413]
We consider a network inventory problem motivated by one-way, on-demand vehicle sharing services.<n>We show that a natural Lipschitz-bandit approach achieves a regret guarantee of $widetildeO(Tfracnn+1)$, which suffers from the exponential dependence on $n$.<n>Motivated by these challenges, we propose an Online Gradient Repositioning algorithm that relies solely on censored demand.
arXiv Detail & Related papers (2025-01-31T15:16:02Z) - Learning-Augmented Competitive Algorithms for Spatiotemporal Online Allocation with Deadline Constraints [11.029788598491077]
We introduce and study a new online problem motivated by emerging challenges in sustainability and energy.<n>In $mathsfSOAD, an online player completes a workload by allocating and scheduling it on the of a metric space $(, d) per point.<n>At each time step, a service cost function is revealed that represents the cost of the workload at each point, and the player must irrevocably decide the current allocation of work to points.
arXiv Detail & Related papers (2024-08-14T22:08:06Z) - Learning to Schedule Online Tasks with Bandit Feedback [7.671139712158846]
Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing.
We propose a double-optimistic learning based Robbins-Monro (DOL-RM) algorithm.
DOL-RM integrates a learning module that incorporates optimistic estimation for reward-to-cost ratio and a decision module.
arXiv Detail & Related papers (2024-02-26T10:11:28Z) - Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes.
We propose an algorithm based on Thompson sampling with dynamically-sized episodes.
We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z) - Efficient Reinforcement Learning with Impaired Observability: Learning
to Act with Delayed and Missing State Observations [92.25604137490168]
This paper introduces a theoretical investigation into efficient reinforcement learning in control systems.
We present algorithms and establish near-optimal regret upper and lower bounds, of the form $tildemathcalO(sqrtrm poly(H) SAK)$, for RL in the delayed and missing observation settings.
arXiv Detail & Related papers (2023-06-02T02:46:39Z) - Linear Stochastic Bandits over a Bit-Constrained Channel [37.01818450308119]
We introduce a new linear bandit formulation over a bit-constrained channel.
The goal of the server is to take actions based on estimates of an unknown model parameter to minimize cumulative regret.
We prove that when the unknown model is $d$-dimensional, a channel capacity of $O(d)$ bits suffices to achieve order-optimal regret.
arXiv Detail & Related papers (2022-03-02T15:54:03Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Scheduling Servers with Stochastic Bilinear Rewards [7.519872646378837]
A system optimization problem arises in multi-class, multi-server queueing system scheduling.
We propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward.
Our algorithm sub-linear regret and sublinear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability.
arXiv Detail & Related papers (2021-12-13T00:37:20Z) - Stochastic Shortest Path: Minimax, Parameter-Free and Towards
Horizon-Free Regret [144.6358229217845]
We study the problem of learning in the shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.
We design a novel model-based algorithm EB-SSP that carefully skews the empirical transitions and perturbs the empirical costs with an exploration bonus.
We prove that EB-SSP achieves the minimax regret rate $widetildeO(B_star sqrtS A K)$, where $K$ is the number of episodes, $S$ is the number of states, $A$
arXiv Detail & Related papers (2021-04-22T17:20:48Z) - Acting in Delayed Environments with Non-Stationary Markov Policies [57.52103323209643]
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.
We prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary.
We devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation.
arXiv Detail & Related papers (2021-01-28T13:35:37Z) - Stochastic Shortest Path with Adversarially Changing Costs [57.90236104782219]
shortest path (SSP) is a well-known problem in planning and control.
We present the adversarial SSP model that also accounts for adversarial changes in the costs over time.
We are the first to consider this natural setting of adversarial SSP and obtain sub-linear regret for it.
arXiv Detail & Related papers (2020-06-20T12:10:35Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.