Pseudo-Boolean Proof Logging for Optimal Classical Planning
- URL: http://arxiv.org/abs/2504.18443v1
- Date: Fri, 25 Apr 2025 15:54:09 GMT
- Title: Pseudo-Boolean Proof Logging for Optimal Classical Planning
- Authors: Simon Dold, Malte Helmert, Jakob Nordström, Gabriele Röger, Tanja Schindler,
- Abstract summary: We describe a general framework for generating lower-bound certificates based on pseudo-Boolean constraints.<n>We show how to modify the $A*$ algorithm to produce proofs of optimality with modest overhead.
- Score: 9.688823478031852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce lower-bound certificates for classical planning tasks, which can be used to prove the unsolvability of a task or the optimality of a plan in a way that can be verified by an independent third party. We describe a general framework for generating lower-bound certificates based on pseudo-Boolean constraints, which is agnostic to the planning algorithm used. As a case study, we show how to modify the $A^{*}$ algorithm to produce proofs of optimality with modest overhead, using pattern database heuristics and $h^\textit{max}$ as concrete examples. The same proof logging approach works for any heuristic whose inferences can be efficiently expressed as reasoning over pseudo-Boolean constraints.
Related papers
- Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment [54.787826863212146]
Inference-time computation offers a powerful axis for scaling the performance of language models.<n>We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute.<n>We introduce $textttInferenceTimePessimism$, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute.
arXiv Detail & Related papers (2025-03-27T18:00:08Z) - Certifying Pareto-Optimality in Multi-Objective Maximum Satisfiability [9.078413809849447]
We present proof logging based on the VeriPB proof format for multi-objective MaxSAT (MO-MaxSAT) optimization techniques.<n>By implementing VeriPB proof logging into a state-of-the-art multi-objective MaxSAT solver, we show empirically that proof logging can be made scalable for MO-MaxSAT with reasonable overhead.
arXiv Detail & Related papers (2025-01-29T09:01:26Z) - Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models.
We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses.
We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z) - e-COP : Episodic Constrained Optimization of Policies [12.854752753529151]
We present the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings.
We show that our algorithm has similar or better performance than SoTA (non-episodic) algorithms adapted for the episodic setting.
arXiv Detail & Related papers (2024-06-13T20:12:09Z) - Stopping Bayesian Optimization with Probabilistic Regret Bounds [1.4141453107129403]
We investigate replacing a de facto stopping rule with criteria based on the probability that a point satisfies a given set of conditions.<n>We give a practical algorithm for evaluating Monte Carlo stopping rules in a manner that is both sample efficient and robust to estimation error.
arXiv Detail & Related papers (2024-02-26T18:34:58Z) - An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning [55.467047686093025]
A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning.
We propose a sampling scheme, textttbf mix-cd, that prioritizes rehearsal of collateral damage'' samples.
Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings.
arXiv Detail & Related papers (2024-02-12T22:32:12Z) - Forward LTLf Synthesis: DPLL At Work [1.370633147306388]
We propose a new AND-OR graph search framework for synthesis of Linear Temporal Logic on finite traces (LTLf)
Within such framework, we devise a procedure inspired by the Davis-Putnam-Logemann-Loveland (DPLL) algorithm to generate the next available agent-environment moves.
We also propose a novel equivalence check for search nodes based on syntactic equivalence of state formulas.
arXiv Detail & Related papers (2023-02-27T14:33:50Z) - Faster Exact MPE and Constrained Optimization with Deterministic Finite
State Automata [2.1777837784979273]
We exploit our concise representation within Bucket Elimination (BE)
Results on most probable explanation and weighted constraint satisfaction benchmarks show that FABE often outperforms the state of the art.
arXiv Detail & Related papers (2021-08-09T09:31:46Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Bayesian Algorithm Execution: Estimating Computable Properties of
Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations.
We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output.
On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Approximate Weighted First-Order Model Counting: Exploiting Fast
Approximate Model Counters and Symmetry [16.574508244279098]
We present ApproxWFOMC, a novel anytime method for efficiently bounding the weighted first-order model count.
The algorithm has applications to inference in a variety of first-order probabilistic representations.
We show how our algorithm outperforms existing approximate and exact techniques for inference in first-order probabilistic models.
arXiv Detail & Related papers (2020-01-15T12:21:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.