Optimal Scoring Rule Design under Partial Knowledge
- URL: http://arxiv.org/abs/2107.07420v3
- Date: Fri, 11 Oct 2024 19:34:17 GMT
- Title: Optimal Scoring Rule Design under Partial Knowledge
- Authors: Yiling Chen, Fang-Yi Yu,
- Abstract summary: We study optimal scoring rules when the principal has partial knowledge of an agent's signal distribution.
In our setting, the principal only knows about a set of distributions where the agent's signal distribution belongs.
We propose an efficient algorithm to compute an optimal scoring rule when the set of distributions is finite.
- Score: 9.759870160862205
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the design of optimal proper scoring rules when the principal has partial knowledge of an agent's signal distribution. Recent work characterizes the proper scoring rules that maximize the increase of an agent's payoff when the agent chooses to access a costly signal to refine a posterior belief from her prior prediction, under the assumption that the agent's signal distribution is fully known to the principal. In our setting, the principal only knows about a set of distributions where the agent's signal distribution belongs. We formulate the scoring rule design problem as a max-min optimization that maximizes the worst-case increase in payoff across the set of distributions. We propose an efficient algorithm to compute an optimal scoring rule when the set of distributions is finite, and devise a fully polynomial-time approximation scheme that accommodates various infinite sets of distributions. We further remark that widely used scoring rules, such as the quadratic and log rules, as well as previously identified optimal scoring rules under full knowledge, can be far from optimal in our partial knowledge settings.
Related papers
- A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z) - A Differential Perspective on Distributional Reinforcement Learning [7.028778922533688]
We extend distributional reinforcement learning to the average-reward setting, where an agent aims to optimize the reward received per time-step.<n>In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution.
arXiv Detail & Related papers (2025-06-03T19:26:25Z) - To Each Metric Its Decoding: Post-Hoc Optimal Decision Rules of Probabilistic Hierarchical Classifiers [43.97690773039761]
We propose a framework for the optimal decoding of an output probability distribution with respect to a target metric.<n>We derive optimal decision rules for increasingly complex prediction settings, providing universal algorithms when candidates are limited to the set of nodes.
arXiv Detail & Related papers (2025-06-02T11:29:40Z) - Optimizing Return Distributions with Distributional Dynamic Programming [38.11199286025947]
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution.
To go beyond expected utilities, we combine distributional DP with stock augmentation, a technique previously introduced for classic DP in the context of risk-sensitive RL.
We describe a number of applications outlining how to use distributional DP to solve different stock-augmented return distribution optimization problems.
arXiv Detail & Related papers (2025-01-22T17:20:43Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Are Bounded Contracts Learnable and Approximately Optimal? [8.121834515103243]
This paper considers the hidden-action model of the principal-agent problem, in which a principal incentivizes an agent to work on a project using a contract.
We investigate whether contracts with bounded payments are learnable and approximately optimal.
arXiv Detail & Related papers (2024-02-22T12:19:19Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Bi-discriminator Domain Adversarial Neural Networks with Class-Level
Gradient Alignment [87.8301166955305]
We propose a novel bi-discriminator domain adversarial neural network with class-level gradient alignment.
BACG resorts to gradient signals and second-order probability estimation for better alignment of domain distributions.
In addition, inspired by contrastive learning, we develop a memory bank-based variant, i.e. Fast-BACG, which can greatly shorten the training process.
arXiv Detail & Related papers (2023-10-21T09:53:17Z) - FIRE: An Optimization Approach for Fast Interpretable Rule Extraction [7.538482310185135]
We present FIRE, Fast Interpretable Rule Extraction, an optimization-based framework to extract a small collection of decision rules from tree ensembles.
We show in our experiments that FIRE outperforms state-of-the-art ensemble algorithms at building sparse rule sets, and can deliver more interpretable models compared to existing methods.
arXiv Detail & Related papers (2023-06-12T21:27:39Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Computing the optimal distributionally-robust strategy to commit to [32.1464237233989]
We show that a distributionally-robust Stackelberg equilibrium always exists across a wide array of uncertainty models.
We present two algorithms to compute a distributionally-robust strong Stackelberg equilibrium.
Experiments substantiate the tractability of our algorithm on a classical Stackelberg game.
arXiv Detail & Related papers (2022-09-15T23:20:26Z) - Better Short than Greedy: Interpretable Models through Optimal Rule
Boosting [10.938624307941197]
Rule ensembles are designed to provide a useful trade-off between predictive accuracy and model interpretability.
We present a novel approach aiming to fit rule ensembles of maximal predictive power for a given ensemble size.
arXiv Detail & Related papers (2021-01-21T01:03:48Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Optimal Change-Point Detection with Training Sequences in the Large and
Moderate Deviations Regimes [72.68201611113673]
This paper investigates a novel offline change-point detection problem from an information-theoretic perspective.
We assume that the knowledge of the underlying pre- and post-change distributions are not known and can only be learned from the training sequences which are available.
arXiv Detail & Related papers (2020-03-13T23:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.