Language Model Cascades
- URL: http://arxiv.org/abs/2207.10342v1
- Date: Thu, 21 Jul 2022 07:35:18 GMT
- Title: Language Model Cascades
- Authors: David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber,
Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha
Sohl-dickstein, Kevin Murphy, Charles Sutton
- Abstract summary: Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities.
Cases with control flow and dynamic structure require techniques from probabilistic programming.
We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
- Score: 72.18809575261498
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompted models have demonstrated impressive few-shot learning abilities.
Repeated interactions at test-time with a single model, or the composition of
multiple models together, further expands capabilities. These compositions are
probabilistic models, and may be expressed in the language of graphical models
with random variables whose values are complex data types such as strings.
Cases with control flow and dynamic structure require techniques from
probabilistic programming, which allow implementing disparate model structures
and inference strategies in a unified language. We formalize several existing
techniques from this perspective, including scratchpads / chain of thought,
verifiers, STaR, selection-inference, and tool use. We refer to the resulting
programs as language model cascades.
Related papers
- Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates.
We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z) - Multilingual Models for Check-Worthy Social Media Posts Detection [0.552480439325792]
The study includes a comprehensive analysis of different models, with a special focus on multilingual models.
The novelty of this work lies in the development of multi-label multilingual classification models that can simultaneously detect harmful posts and posts that contain verifiable factual claims in an efficient way.
arXiv Detail & Related papers (2024-08-13T08:55:28Z) - DiSK: A Diffusion Model for Structured Knowledge [12.472921856815942]
Diffusion Models of Structured Knowledge (DiSK) is a new architecture and training approach specialized for structured data.
DiSK handles text, categorical, and continuous numerical data using a Gaussian mixture model approach.
arXiv Detail & Related papers (2023-12-08T18:59:14Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings is a significant challenge.
We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.
We demonstrate that generative models like GPT can accurately learn this CFG language and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Artificial Interrogation for Attributing Language Models [0.0]
The challenge provides twelve open-sourced base versions of popular language models and twelve fine-tuned language models for text generation.
The goal of the contest is to identify which fine-tuned models originated from which base model.
We have employed four distinct approaches for measuring the resemblance between the responses generated from the models of both sets.
arXiv Detail & Related papers (2022-11-20T05:46:29Z) - Multi-Model Probabilistic Programming [0.0]
We present an extension of probabilistic programming that lets each program represent a network of interrelated probabilistic models.
We give a formal semantics for these multi-model probabilistic programs, a collection of efficient algorithms for network-of-model operations, and an example implementation built on top of the popular probabilistic programming language Stan.
This network-of-models representation opens many doors, including search and automation in model-space, tracking and communication of model development, and explicit modeler degrees of freedom to mitigate issues like p-hacking.
arXiv Detail & Related papers (2022-08-12T15:38:15Z) - Low-Rank Constraints for Fast Inference in Structured Models [110.38427965904266]
This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models.
Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces.
arXiv Detail & Related papers (2022-01-08T00:47:50Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Explicitly Modeling Syntax in Language Models with Incremental Parsing
and a Dynamic Oracle [88.65264818967489]
We propose a new syntax-aware language model: Syntactic Ordered Memory (SOM)
The model explicitly models the structure with an incremental and maintains the conditional probability setting of a standard language model.
Experiments show that SOM can achieve strong results in language modeling, incremental parsing and syntactic generalization tests.
arXiv Detail & Related papers (2020-10-21T17:39:15Z) - Overestimation of Syntactic Representationin Neural Language Models [16.765097098482286]
One popular method for determining a model's ability to induce syntactic structure trains a model on strings generated according to a template then tests the model's ability to distinguish such strings from superficially similar ones with different syntax.
We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models.
arXiv Detail & Related papers (2020-04-10T15:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.