Related papers: Pseudo-Boolean Proof Logging for Optimal Classical Planning

Related papers

Optimal and Practical Batched Linear Bandit Algorithm [8.087699764574788]
We study the linear bandit problem under limited adaptivity, known as the batched linear bandit.<n>We propose textttBLAE, a novel batched algorithm that integrates arm elimination with regularized G-optimal design.<n>Our analysis introduces new techniques for batch-wise optimal design and refined concentration bounds.
arXiv Detail & Related papers (2025-07-11T09:29:28Z)
On Monotonicity in AI Alignment [10.244128221542228]
This paper investigates the root causes of (non) monotonicity, for a general comparison-based preference learning framework.<n>Under mild assumptions, we prove that such methods still satisfy what we call local pairwise monotonicity.<n>We also provide a bouquet of formalizations of monotonicity, and identify sufficient conditions for their guarantee, thereby providing a toolbox to evaluate how prone learning models are to monotonicity violations.
arXiv Detail & Related papers (2025-06-10T17:17:48Z)
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment [54.787826863212146]
Inference-time computation offers a powerful axis for scaling the performance of language models.<n>We analyze the performance of inference-time alignment algorithms in terms of (i) response quality, and (ii) compute.<n>We introduce $textttInferenceTimePessimism$, a new algorithm which mitigates reward hacking through deliberate use of inference-time compute.
arXiv Detail & Related papers (2025-03-27T18:00:08Z)
Certifying Pareto-Optimality in Multi-Objective Maximum Satisfiability [9.078413809849447]
We present proof logging based on the VeriPB proof format for multi-objective MaxSAT (MO-MaxSAT) optimization techniques.<n>By implementing VeriPB proof logging into a state-of-the-art multi-objective MaxSAT solver, we show empirically that proof logging can be made scalable for MO-MaxSAT with reasonable overhead.
arXiv Detail & Related papers (2025-01-29T09:01:26Z)
Graph-Structured Speculative Decoding [52.94367724136063]
Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models. We introduce an innovative approach utilizing a directed acyclic graph (DAG) to manage the drafted hypotheses. We observe a remarkable speedup of 1.73$times$ to 1.96$times$, significantly surpassing standard speculative decoding.
arXiv Detail & Related papers (2024-07-23T06:21:24Z)
e-COP : Episodic Constrained Optimization of Policies [12.854752753529151]
We present the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. We show that our algorithm has similar or better performance than SoTA (non-episodic) algorithms adapted for the episodic setting.
arXiv Detail & Related papers (2024-06-13T20:12:09Z)
Stopping Bayesian Optimization with Probabilistic Regret Bounds [1.4141453107129403]
We investigate replacing a de facto stopping rule with criteria based on the probability that a point satisfies a given set of conditions.<n>We give a practical algorithm for evaluating Monte Carlo stopping rules in a manner that is both sample efficient and robust to estimation error.
arXiv Detail & Related papers (2024-02-26T18:34:58Z)
An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning [55.467047686093025]
A common approach to alleviate such forgetting is to rehearse samples from prior tasks during fine-tuning. We propose a sampling scheme, textttbf mix-cd, that prioritizes rehearsal of collateral damage'' samples. Our approach is computationally efficient, easy to implement, and outperforms several leading continual learning methods in compute-constrained settings.
arXiv Detail & Related papers (2024-02-12T22:32:12Z)
Forward LTLf Synthesis: DPLL At Work [1.370633147306388]
We propose a new AND-OR graph search framework for synthesis of Linear Temporal Logic on finite traces (LTLf) Within such framework, we devise a procedure inspired by the Davis-Putnam-Logemann-Loveland (DPLL) algorithm to generate the next available agent-environment moves. We also propose a novel equivalence check for search nodes based on syntactic equivalence of state formulas.
arXiv Detail & Related papers (2023-02-27T14:33:50Z)
Faster Exact MPE and Constrained Optimization with Deterministic Finite State Automata [2.1777837784979273]
We exploit our concise representation within Bucket Elimination (BE) Results on most probable explanation and weighted constraint satisfaction benchmarks show that FABE often outperforms the state of the art.
arXiv Detail & Related papers (2021-08-09T09:31:46Z)
Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits. We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z)
Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information [78.78486761923855]
In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations. We present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output. On these problems, InfoBAX uses up to 500 times fewer queries to f than required by the original algorithm.
arXiv Detail & Related papers (2021-04-19T17:22:11Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Approximate Weighted First-Order Model Counting: Exploiting Fast Approximate Model Counters and Symmetry [16.574508244279098]
We present ApproxWFOMC, a novel anytime method for efficiently bounding the weighted first-order model count. The algorithm has applications to inference in a variety of first-order probabilistic representations. We show how our algorithm outperforms existing approximate and exact techniques for inference in first-order probabilistic models.
arXiv Detail & Related papers (2020-01-15T12:21:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.