Markov Constraint as Large Language Model Surrogate
- URL: http://arxiv.org/abs/2406.10269v1
- Date: Tue, 11 Jun 2024 16:09:53 GMT
- Title: Markov Constraint as Large Language Model Surrogate
- Authors: Alexandre Bonlarron, Jean-Charles RĂ©gin,
- Abstract summary: NgramMarkov is dedicated to text generation in constraint programming (CP)
It limits the product of the probabilities of the n-gram of a sentence.
A real-world problem has been solved for the first time using 4-grams instead of 5-grams.
- Score: 49.86129209397701
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper presents NgramMarkov, a variant of the Markov constraints. It is dedicated to text generation in constraint programming (CP). It involves a set of n-grams (i.e., sequence of n words) associated with probabilities given by a large language model (LLM). It limits the product of the probabilities of the n-gram of a sentence. The propagator of this constraint can be seen as an extension of the ElementaryMarkov constraint propagator, incorporating the LLM distribution instead of the maximum likelihood estimation of n-grams. It uses a gliding threshold, i.e., it rejects n-grams whose local probabilities are too low, to guarantee balanced solutions. It can also be combined with a "look-ahead" approach to remove n-grams that are very unlikely to lead to acceptable sentences for a fixed-length horizon. This idea is based on the MDDMarkovProcess constraint propagator, but without explicitly using an MDD (Multi-Valued Decision Diagram). The experimental results show that the generated text is valued in a similar way to the LLM perplexity function. Using this new constraint dramatically reduces the number of candidate sentences produced, improves computation times, and allows larger corpora or smaller n-grams to be used. A real-world problem has been solved for the first time using 4-grams instead of 5-grams.
Related papers
- Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)
We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.
PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z) - Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models [79.70436109672599]
We derive non-vacuous generalization bounds for large language models as large as LLaMA2-70B.
Our work achieves the first non-vacuous bounds for models that are deployed in practice and generate high-quality text.
arXiv Detail & Related papers (2024-07-25T16:13:58Z) - A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning [2.8266810371534152]
Discnative deep learning models with a linear+softmax final layer have a problem.
Latent space only predicts the conditional probabilities $p(Y|X)$ but not the full joint distribution $p(Y,X)$.
This exacerbates model over-confidence impacting many problems, such as hallucinations, confounding biases, and dependence on large datasets.
arXiv Detail & Related papers (2024-04-27T18:41:32Z) - Graph Cuts with Arbitrary Size Constraints Through Optimal Transport [18.338458637795263]
We propose a new graph cut algorithm for partitioning graphs under arbitrary size constraints.
We solve it using an accelerated proximal GD algorithm which guarantees global convergence to a critical point.
arXiv Detail & Related papers (2024-02-07T10:33:09Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Constraints First: A New MDD-based Model to Generate Sentences Under
Constraints [45.498315114762484]
This paper introduces a new approach to generating strongly constrained texts.
We use multivalued decision diagrams (MDD), a well-known data structure to deal with constraints.
We get hundreds of bona-fide candidate sentences when compared with the few dozen sentences usually available in the well-known vision screening test (MNREAD)
arXiv Detail & Related papers (2023-09-21T18:29:52Z) - Efficient Graph Laplacian Estimation by Proximal Newton [12.05527862797306]
A graph learning problem can be formulated as a maximum likelihood estimation (MLE) of the precision matrix.
We develop a second-order approach to obtain an efficient solver utilizing several algorithmic features.
arXiv Detail & Related papers (2023-02-13T15:13:22Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Stochastic Bundle Adjustment for Efficient and Scalable 3D
Reconstruction [43.736296034673124]
Current bundle adjustment solvers such as the Levenberg-Marquardt (LM) algorithm are limited by the bottleneck in solving the Reduced Camera System (RCS) whose dimension is proportional to the camera number.
We propose a bundle adjustment algorithm which seeks to decompose the RCS approximately inside the LM to improve the efficiency and scalability.
arXiv Detail & Related papers (2020-08-02T10:26:09Z) - Towards Discriminability and Diversity: Batch Nuclear-norm Maximization
under Label Insufficient Situations [154.51144248210338]
Batch Nuclear-norm Maximization (BNM) is proposed to boost the learning under label insufficient learning scenarios.
BNM outperforms competitors and works well with existing well-known methods.
arXiv Detail & Related papers (2020-03-27T05:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.