Stay on topic with Classifier-Free Guidance
- URL: http://arxiv.org/abs/2306.17806v1
- Date: Fri, 30 Jun 2023 17:07:02 GMT
- Title: Stay on topic with Classifier-Free Guidance
- Authors: Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan
Sasanka Ammanamanchi, Stella Biderman
- Abstract summary: We show that CFG can be used broadly as an inference-time technique in pure language modeling.
We show that CFG improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks.
- Score: 57.28934343207042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classifier-Free Guidance (CFG) has recently emerged in text-to-image
generation as a lightweight technique to encourage prompt-adherence in
generations. In this work, we demonstrate that CFG can be used broadly as an
inference-time technique in pure language modeling. We show that CFG (1)
improves the performance of Pythia, GPT-2 and LLaMA-family models across an
array of tasks: Q\&A, reasoning, code generation, and machine translation,
achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements
equivalent to a model with twice the parameter-count; (3) can stack alongside
other inference-time methods like Chain-of-Thought and Self-Consistency,
yielding further improvements in difficult tasks; (4) can be used to increase
the faithfulness and coherence of assistants in challenging form-driven and
content-driven prompts: in a human evaluation we show a 75\% preference for
GPT4All using CFG over baseline.
Related papers
- Diversity-Rewarded CFG Distillation [62.08448835625036]
We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations.
Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
arXiv Detail & Related papers (2024-10-08T14:40:51Z) - Leveraging Fine-Tuned Retrieval-Augmented Generation with Long-Context Support: For 3GPP Standards [4.334100270812517]
Large language models (LLMs) struggle with technical standards in telecommunications.
We propose a fine-tuned retrieval-augmented generation (RAG) system based on the Phi-2 small language model (SLM)
Our experiments demonstrate substantial improvements over existing question-answering approaches in the telecom domain.
arXiv Detail & Related papers (2024-08-21T17:00:05Z) - Adaptable Logical Control for Large Language Models [68.27725600175013]
Ctrl-G is an adaptable framework that facilitates tractable and flexible control of model generation at inference time.
We show that Ctrl-G, when applied to a TULU2-7B model, outperforms GPT3.5 and GPT4 on the task of interactive text editing.
arXiv Detail & Related papers (2024-06-19T23:47:59Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Investigating the Efficacy of Large Language Models for Code Clone
Detection [2.0749231618270803]
Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks.
In this study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task.
ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD.
arXiv Detail & Related papers (2024-01-24T20:43:36Z) - APoLLo: Unified Adapter and Prompt Learning for Vision Language Models [58.9772868980283]
We present APoLLo, a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models.
APoLLo achieves a relative gain up to 6.03% over MaPLe (SOTA) on novel classes for 10 diverse image recognition datasets.
arXiv Detail & Related papers (2023-12-04T01:42:09Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - Chain-of-Thought Hub: A Continuous Effort to Measure Large Language
Models' Reasoning Performance [35.38549845444575]
Chain-of-Thought Hub is an open-source evaluation suite on the multi-step reasoning capabilities of large language models.
This work proposes Chain-of-Thought Hub, an open-source evaluation suite on the multi-step reasoning capabilities of large language models.
arXiv Detail & Related papers (2023-05-26T23:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.