Priority Sampling of Large Language Models for Compilers
- URL: http://arxiv.org/abs/2402.18734v1
- Date: Wed, 28 Feb 2024 22:27:49 GMT
- Title: Priority Sampling of Large Language Models for Compilers
- Authors: Dejan Grubisic, Chris Cummins, Volker Seeker, Hugh Leather
- Abstract summary: Priority Sampling is a simple and deterministic sampling technique that produces unique samples ordered by the model's confidence.
It supports generation based on regular expression that provides a controllable and structured exploration process.
It outperforms the autotuner used for the generation of labels for the training of the original model in just 30 samples.
- Score: 4.2266182821287135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models show great potential in generating and optimizing code.
Widely used sampling methods such as Nucleus Sampling increase the diversity of
generation but often produce repeated samples for low temperatures and
incoherent samples for high temperatures. Furthermore, the temperature
coefficient has to be tuned for each task, limiting its usability. We present
Priority Sampling, a simple and deterministic sampling technique that produces
unique samples ordered by the model's confidence. Each new sample expands the
unexpanded token with the highest probability in the augmented search tree.
Additionally, Priority Sampling supports generation based on regular expression
that provides a controllable and structured exploration process. Priority
Sampling outperforms Nucleus Sampling for any number of samples, boosting the
performance of the original model from 2.87% to 5% improvement over -Oz.
Moreover, it outperforms the autotuner used for the generation of labels for
the training of the original model in just 30 samples.
Related papers
- Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation [51.127054971591924]
We introduce a new generative self-evaluation scheme designed to adaptively reduce the number of generated samples.
We demonstrate that 74% of the improvement from using 16 samples can be achieved with only 1.2 samples on average.
arXiv Detail & Related papers (2024-10-03T17:47:29Z) - Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs [4.122612309805664]
Large Language Models (LLMs) generate text by sampling the next token from a probability distribution over the vocabulary at each decoding step.
We propose min-p sampling, a dynamic truncation method that adjusts the sampling threshold based on the model's confidence by scaling according to the top token's probability.
We conduct extensive experiments on benchmarks including GPQA, GSM8K, and AlpacaEval Creative Writing, demonstrating that min-p sampling improves both the quality and diversity of generated text, particularly at high temperatures.
arXiv Detail & Related papers (2024-07-01T08:37:25Z) - Foundation Model Makes Clustering A Better Initialization For Cold-Start Active Learning [5.609241010973952]
We propose to integrate foundation models with clustering methods to select samples for cold-start active learning.
Foundation models refer to those trained on massive datasets by the self-supervised paradigm.
For a comprehensive comparison, we included a classic ImageNet-supervised model to acquire embeddings.
arXiv Detail & Related papers (2024-02-04T16:27:37Z) - A Block Metropolis-Hastings Sampler for Controllable Energy-based Text
Generation [78.81021361497311]
We develop a novel Metropolis-Hastings (MH) sampler that proposes re-writes of the entire sequence in each step via iterative prompting of a large language model.
Our new sampler allows for more efficient and accurate sampling from a target distribution and (b) allows generation length to be determined through the sampling procedure rather than fixed in advance.
arXiv Detail & Related papers (2023-12-07T18:30:15Z) - Generating High Fidelity Synthetic Data via Coreset selection and
Entropic Regularization [15.866662428675054]
We propose using a combination of coresets selection methods and entropic regularization'' to select the highest fidelity samples.
In a semi-supervised learning scenario, we show that augmenting the labeled data-set, by adding our selected subset of samples, leads to better accuracy improvement.
arXiv Detail & Related papers (2023-01-31T22:59:41Z) - Boost Test-Time Performance with Closed-Loop Inference [85.43516360332646]
We propose to predict hard-classified test samples in a looped manner to boost the model performance.
We first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops.
For each hard sample, we construct an additional auxiliary learning task based on its original top-$K$ predictions to calibrate the model.
arXiv Detail & Related papers (2022-03-21T10:20:21Z) - Reparameterized Sampling for Generative Adversarial Networks [71.30132908130581]
We propose REP-GAN, a novel sampling method that allows general dependent proposals by REizing the Markov chains into the latent space of the generator.
Empirically, extensive experiments on synthetic and real datasets demonstrate that our REP-GAN largely improves the sample efficiency and obtains better sample quality simultaneously.
arXiv Detail & Related papers (2021-07-01T10:34:55Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Incremental Sampling Without Replacement for Sequence Models [39.3035292844624]
We present an elegant procedure for sampling without replacement from a broad class of randomized programs.
Our approach is incremental, i.e., samples can be drawn one at a time, allowing for increased flexibility.
arXiv Detail & Related papers (2020-02-21T00:12:01Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.