DPO: Dynamic-Programming Optimization on Hybrid Constraints
- URL: http://arxiv.org/abs/2205.08632v1
- Date: Tue, 17 May 2022 21:18:54 GMT
- Title: DPO: Dynamic-Programming Optimization on Hybrid Constraints
- Authors: Vu H. N. Phan and Moshe Y. Vardi
- Abstract summary: In Bayesian inference, the most probable explanation (MPE) problem requests a variable instantiation with the highest probability given some evidence.
It is known that Boolean MPE can be solved via reduction to (weighted partial) MaxSAT.
We build on DPMC and introduce DPO, a dynamic-programming that exactly solves MPE.
- Score: 26.704502486686128
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Bayesian inference, the most probable explanation (MPE) problem requests a
variable instantiation with the highest probability given some evidence. Since
a Bayesian network can be encoded as a literal-weighted CNF formula $\varphi$,
we study Boolean MPE, a more general problem that requests a model $\tau$ of
$\varphi$ with the highest weight, where the weight of $\tau$ is the product of
weights of literals satisfied by $\tau$. It is known that Boolean MPE can be
solved via reduction to (weighted partial) MaxSAT. Recent work proposed DPMC, a
dynamic-programming model counter that leverages graph-decomposition techniques
to construct project-join trees. A project-join tree is an execution plan that
specifies how to conjoin clauses and project out variables. We build on DPMC
and introduce DPO, a dynamic-programming optimizer that exactly solves Boolean
MPE. By using algebraic decision diagrams (ADDs) to represent pseudo-Boolean
(PB) functions, DPO is able to handle disjunctive clauses as well as XOR
clauses. (Cardinality constraints and PB constraints may also be compactly
represented by ADDs, so one can further extend DPO's support for hybrid
inputs.) To test the competitiveness of DPO, we generate random XOR-CNF
formulas. On these hybrid benchmarks, DPO significantly outperforms MaxHS,
UWrMaxSat, and GaussMaxHS, which are state-of-the-art exact solvers for MaxSAT.
Related papers
- Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model.
Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z) - Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation.
We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP)
We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z) - DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models [55.06442277395388]
NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models.
Vanilla NADO suffers from vanishing gradient for low-probability control signals.
We propose a improved version of the NADO algorithm, namely DiNADO, which improves the performance of the NADO algorithm.
arXiv Detail & Related papers (2023-06-20T18:36:52Z) - Near Sample-Optimal Reduction-based Policy Learning for Average Reward
MDP [58.13930707612128]
This work considers the sample complexity of obtaining an $varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP)
We prove an upper bound of $widetilde O(H varepsilon-3 ln frac1delta)$ samples per state-action pair, where $H := sp(h*)$ is the span of bias of any optimal policy, $varepsilon$ is the accuracy and $delta$ is the failure probability.
arXiv Detail & Related papers (2022-12-01T15:57:58Z) - B$^3$RTDP: A Belief Branch and Bound Real-Time Dynamic Programming
Approach to Solving POMDPs [17.956744635160568]
We propose an extension to the RTDP-Bel algorithm which we call Belief Branch and Bound RTDP (B$3$RTDP)
Our algorithm uses a bounded value function representation and takes advantage of this in two novel ways.
We empirically demonstrate that B$3$RTDP can achieve greater returns in less time than the state-of-the-art SARSOP solver on known POMDP problems.
arXiv Detail & Related papers (2022-10-22T21:42:59Z) - DPER: Dynamic Programming for Exist-Random Stochastic SAT [26.704502486686128]
We present the exist-random satisfiability (ER-SSAT) problem, which combines the satisfiability (SAT) and weighted model counting (WMC) problems.
We extend this WPMC framework to exactly solve ER-SSAT and implement a dynamic-programming solver named DPER.
arXiv Detail & Related papers (2022-05-19T19:54:34Z) - DPMS: An ADD-Based Symbolic Approach for Generalized MaxSAT Solving [45.21499915442282]
We propose a novel dynamic-programming approach for solving generalized MaxSAT problems with hybrid constraints.
Our versatile framework admits many generalizations of CNF-MaxSAT, such as MaxSAT, Min-MaxSAT, and MinSAT with hybrid constraints.
Empirical results indicate that DPMS is able to solve certain problems quickly, where other algorithms based on various techniques all fail.
arXiv Detail & Related papers (2022-05-08T00:31:53Z) - Introducing the Expohedron for Efficient Pareto-optimal Fairness-Utility
Amortizations in Repeated Rankings [9.066817876491053]
We consider the problem of computing a sequence of rankings that maximizes consumer-side utility while minimizing producer-side individual unfairness of exposure.
We introduce a geometrical object, a polytope that we call expohedron, whose points represent all achievable exposures of items for a Position Based Model.
Our approach compares favorably to linear or quadratic programming baselines in terms of algorithmic complexity and empirical runtime.
Our solution can be expressed as a distribution over only $n$ permutations, instead of the $(n-1)2 + 1$ achieved with BvN decompositions.
arXiv Detail & Related papers (2022-02-07T14:43:35Z) - Projection-free Graph-based Classifier Learning using Gershgorin Disc
Perfect Alignment [59.87663954467815]
In graph-based binary learning, a subset of known labels $hatx_i$ are used to infer unknown labels.
When restricting labels $x_i$ to binary values, the problem is NP-hard.
We propose a fast projection-free method by solving a sequence of linear programs (LP) instead.
arXiv Detail & Related papers (2021-06-03T07:22:48Z) - Your GAN is Secretly an Energy-based Model and You Should use
Discriminator Driven Latent Sampling [106.68533003806276]
We show that sampling in latent space can be achieved by sampling in latent space according to an energy-based model induced by the sum of the latent prior log-density and the discriminator output score.
We show that Discriminator Driven Latent Sampling(DDLS) is highly efficient compared to previous methods which work in the high-dimensional pixel space.
arXiv Detail & Related papers (2020-03-12T23:33:50Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.