Decision Making under Imperfect Recall: Algorithms and Benchmarks
- URL: http://arxiv.org/abs/2602.15252v1
- Date: Mon, 16 Feb 2026 23:19:01 GMT
- Title: Decision Making under Imperfect Recall: Algorithms and Benchmarks
- Authors: Emanuel Tewolde, Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm, Vincent Conitzer,
- Abstract summary: We introduce the first benchmark suite for imperfect-recall decision problems.<n>Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems.<n>We evaluate the performance of different algorithms for finding first-order optimal strategies in such problems.
- Score: 77.12503122836422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In game theory, imperfect-recall decision problems model situations in which an agent forgets information it held before. They encompass games such as the ``absentminded driver'' and team games with limited communication. In this paper, we introduce the first benchmark suite for imperfect-recall decision problems. Our benchmarks capture a variety of problem types, including ones concerning privacy in AI systems that elicit sensitive information, and AI safety via testing of agents in simulation. Across 61 problem instances generated using this suite, we evaluate the performance of different algorithms for finding first-order optimal strategies in such problems. In particular, we introduce the family of regret matching (RM) algorithms for nonlinear constrained optimization. This class of parameter-free algorithms has enjoyed tremendous success in solving large two-player zero-sum games, but, surprisingly, they were hitherto relatively unexplored beyond that setting. Our key finding is that RM algorithms consistently outperform commonly employed first-order optimizers such as projected gradient descent, often by orders of magnitude. This establishes, for the first time, the RM family as a formidable approach to large-scale constrained optimization problems.
Related papers
- Exposing the Copycat Problem of Imitation-based Planner: A Novel Closed-Loop Simulator, Causal Benchmark and Joint IL-RL Baseline [49.51385135697656]
Within machine learning-based planning, imitation learning (IL) is a common algorithm.<n>It primarily learns driving policies directly from supervised trajectory data.<n>It remains challenging to determine if the learned policy truly understands fundamental driving principles.<n>This work proposes a novel closed-loop simulator supporting both imitation and reinforcement learning.
arXiv Detail & Related papers (2025-04-20T18:51:26Z) - Learning Multiple Initial Solutions to Optimization Problems [52.9380464408756]
Sequentially solving similar optimization problems under strict runtime constraints is essential for many applications.<n>We propose learning to predict emphmultiple diverse initial solutions given parameters that define the problem instance.<n>We find significant and consistent improvement with our method across all evaluation settings and demonstrate that it efficiently scales with the number of initial solutions required.
arXiv Detail & Related papers (2024-11-04T15:17:19Z) - Efficient Learning of Decision-Making Models: A Penalty Block Coordinate
Descent Algorithm for Data-Driven Inverse Optimization [12.610576072466895]
We consider the inverse problem where we use prior decision data to uncover the underlying decision-making process.
This statistical learning problem is referred to as data-driven inverse optimization.
We propose an efficient block coordinate descent-based algorithm to solve large problem instances.
arXiv Detail & Related papers (2022-10-27T12:52:56Z) - Stochastic Differentially Private and Fair Learning [7.971065005161566]
We provide the first differentially private algorithm for fair learning that is guaranteed to converge.
Our framework is flexible enough to permit different fairness, including demographic parity and equalized odds.
Our algorithm can be applied to non-binary classification tasks with multiple (non-binary) sensitive attributes.
arXiv Detail & Related papers (2022-10-17T06:54:57Z) - Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms.
For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime.
In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem.
We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z) - Double Coverage with Machine-Learned Advice [100.23487145400833]
We study the fundamental online $k$-server problem in a learning-augmented setting.
We show that our algorithm achieves for any k an almost optimal consistency-robustness tradeoff.
arXiv Detail & Related papers (2021-03-02T11:04:33Z) - Constraint-Handling Techniques for Particle Swarm Optimization
Algorithms [0.0]
Population-based methods can cope with a variety of different problems, including problems of remarkably higher complexity than those traditional methods can handle.
The aim here is to develop and compare different CHTs suitable for PSOs, which are incorporated to an algorithm with general-purpose settings.
arXiv Detail & Related papers (2021-01-25T01:49:10Z) - Multi-armed Bandits with Cost Subsidy [1.6631602844999724]
We present two applications, intelligent SMS routing problem and ad audience optimization problem.
We show that naive generalizations of existing MAB algorithms do not perform well for this problem.
We also present a simple variant of explore-then-commit and establish near-optimal regret bounds for this algorithm.
arXiv Detail & Related papers (2020-11-03T05:38:42Z) - Run2Survive: A Decision-theoretic Approach to Algorithm Selection based
on Survival Analysis [75.64261155172856]
survival analysis (SA) naturally supports censored data and offers appropriate ways to use such data for learning distributional models of algorithm runtime.
We leverage such models as a basis of a sophisticated decision-theoretic approach to algorithm selection, which we dub Run2Survive.
In an extensive experimental study with the standard benchmark ASlib, our approach is shown to be highly competitive and in many cases even superior to state-of-the-art AS approaches.
arXiv Detail & Related papers (2020-07-06T15:20:17Z) - Towards Solving Large-scale Expensive Optimization Problems Efficiently
Using Coordinate Descent Algorithm [3.1600708674008393]
We propose a modified Coordinate Descent algorithm (MCD) to tackle LSEGO problems with a limited computational budget.
Our proposed algorithm benefits from two leading steps, namely, finding the region of interest and then shrinkage of the search space by folding it into the half with exponential speed.
The proposed algorithm is compared with cooperative co-evolution with delta grouping on 20 benchmark functions with dimension 1000.
arXiv Detail & Related papers (2020-03-07T22:48:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.