Related papers: Markov Constraint as Large Language Model Surrogate

Markov Constraint as Large Language Model Surrogate

URL: http://arxiv.org/abs/2406.10269v1
Date: Tue, 11 Jun 2024 16:09:53 GMT
Title: Markov Constraint as Large Language Model Surrogate
Authors: Alexandre Bonlarron, Jean-Charles Régin,
Abstract summary: NgramMarkov is dedicated to text generation in constraint programming (CP) It limits the product of the probabilities of the n-gram of a sentence. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.
Score: 49.86129209397701
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This paper presents NgramMarkov, a variant of the Markov constraints. It is dedicated to text generation in constraint programming (CP). It involves a set of n-grams (i.e., sequence of n words) associated with probabilities given by a large language model (LLM). It limits the product of the probabilities of the n-gram of a sentence. The propagator of this constraint can be seen as an extension of the ElementaryMarkov constraint propagator, incorporating the LLM distribution instead of the maximum likelihood estimation of n-grams. It uses a gliding threshold, i.e., it rejects n-grams whose local probabilities are too low, to guarantee balanced solutions. It can also be combined with a "look-ahead" approach to remove n-grams that are very unlikely to lead to acceptable sentences for a fixed-length horizon. This idea is based on the MDDMarkovProcess constraint propagator, but without explicitly using an MDD (Multi-Valued Decision Diagram). The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Controllable Generation via Locally Constrained Resampling [77.48624621592523]
We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint. Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods. We show that our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification.
arXiv Detail & Related papers (2024-10-17T00:49:53Z)
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models [79.70436109672599]
We derive non-vacuous generalization bounds for large language models as large as LLaMA2-70B. Our work achieves the first non-vacuous bounds for models that are deployed in practice and generate high-quality text.
arXiv Detail & Related papers (2024-07-25T16:13:58Z)
A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning [2.8266810371534152]
Discnative deep learning models with a linear+softmax final layer have a problem. Latent space only predicts the conditional probabilities $p(Y|X)$ but not the full joint distribution $p(Y,X)$. This exacerbates model over-confidence impacting many problems, such as hallucinations, confounding biases, and dependence on large datasets.
arXiv Detail & Related papers (2024-04-27T18:41:32Z)
Graph Cuts with Arbitrary Size Constraints Through Optimal Transport [18.338458637795263]
We propose a new graph cut algorithm for partitioning graphs under arbitrary size constraints. We solve it using an accelerated proximal GD algorithm which guarantees global convergence to a critical point.
arXiv Detail & Related papers (2024-02-07T10:33:09Z)
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution. We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z)
Constraints First: A New MDD-based Model to Generate Sentences Under Constraints [45.498315114762484]
This paper introduces a new approach to generating strongly constrained texts. We use multivalued decision diagrams (MDD), a well-known data structure to deal with constraints. We get hundreds of bona-fide candidate sentences when compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD)
arXiv Detail & Related papers (2023-09-21T18:29:52Z)
Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix. We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Stochastic Bundle Adjustment for Efficient and Scalable 3D Reconstruction [43.736296034673124]
Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number. We propose a bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM to improve the efficiency and scalability.
arXiv Detail & Related papers (2020-08-02T10:26:09Z)
Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations [154.51144248210338]
Batch Nuclear-norm Maximization (BNM) is proposed to boost the learning under label insufficient learning scenarios. BNM outperforms competitors and works well with existing well-known methods.
arXiv Detail & Related papers (2020-03-27T05:04:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.