Textbooks Are All You Need II: phi-1.5 technical report
- URL: http://arxiv.org/abs/2309.05463v1
- Date: Mon, 11 Sep 2023 14:01:45 GMT
- Title: Textbooks Are All You Need II: phi-1.5 technical report
- Authors: Yuanzhi Li, S\'ebastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya
Gunasekar, Yin Tat Lee
- Abstract summary: We create a new 1.3 billion parameter model named textbfphi-1.5 with performance on natural language tasks comparable to models 5x larger.
textbfphi-1.5 exhibits many of the traits of much larger Large Language Models.
We open-source textbfphi-1.5 to promote further research on these urgent topics.
- Score: 55.6940110946465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We continue the investigation into the power of smaller Transformer-based
language models as initiated by \textbf{TinyStories} -- a 10 million parameter
model that can produce coherent English -- and the follow-up work on
\textbf{phi-1}, a 1.3 billion parameter model with Python coding performance
close to the state-of-the-art. The latter work proposed to use existing Large
Language Models (LLMs) to generate ``textbook quality" data as a way to enhance
the learning process compared to traditional web data. We follow the
``Textbooks Are All You Need" approach, focusing this time on common sense
reasoning in natural language, and create a new 1.3 billion parameter model
named \textbf{phi-1.5}, with performance on natural language tasks comparable
to models 5x larger, and surpassing most non-frontier LLMs on more complex
reasoning tasks such as grade-school mathematics and basic coding. More
generally, \textbf{phi-1.5} exhibits many of the traits of much larger LLMs,
both good -- such as the ability to ``think step by step" or perform some
rudimentary in-context learning -- and bad, including hallucinations and the
potential for toxic and biased generations -- encouragingly though, we are
seeing improvement on that front thanks to the absence of web data. We
open-source \textbf{phi-1.5} to promote further research on these urgent
topics.
Related papers
- SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - 1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data [0.0]
This paper presents a compute-efficient approach to pre-training a Language Model-the "1.5-Pints"-in only 9 days.
Based on MT-Bench (a benchmark that emulates human judgments), 1.5-Pints outperforms Apple's OpenELM and Microsoft's Phi.
This is achieved by a carefully curated pre-training dataset of 57 billion tokens, using a mix of automated and manual human review.
arXiv Detail & Related papers (2024-08-07T02:14:52Z) - Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning [68.43706033424378]
This study introduces an innovative method designed to increase in-context text length in large language models (MLLMs) efficiently.
We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens.
This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage.
arXiv Detail & Related papers (2024-06-04T17:59:25Z) - Multilingual E5 Text Embeddings: A Technical Report [63.503320030117145]
Three embedding models of different sizes are provided, offering a balance between the inference efficiency and embedding quality.
We introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes.
arXiv Detail & Related papers (2024-02-08T13:47:50Z) - Towards General Text Embeddings with Multi-stage Contrastive Learning [20.803769345818456]
GTE is a general-purpose text embedding model trained with multi-stage contrastive learning.
We train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources.
arXiv Detail & Related papers (2023-08-07T03:52:59Z) - Evaluating Generative Models for Graph-to-Text Generation [0.0]
We explore the capability of generative models to generate descriptive text from graph data in a zero-shot setting.
Our results demonstrate that generative models are capable of generating fluent and coherent text.
However, our error analysis reveals that generative models still struggle with understanding the semantic relations between entities.
arXiv Detail & Related papers (2023-07-27T09:03:05Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - Foundation Models for Natural Language Processing -- Pre-trained
Language Models Integrating Media [0.0]
Foundation Models are pre-trained language models for Natural Language Processing.
They can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning.
This book provides a comprehensive overview of the state of the art in research and applications of Foundation Models.
arXiv Detail & Related papers (2023-02-16T20:42:04Z) - Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and
Few-Shot Learning [18.932100477957462]
Recent work like GPT-3 has demonstrated excellent performance of Zero-Shot and Few-Shot learning on many natural language processing (NLP) tasks.
We propose a method that incorporates large-scale distributed training performance into model architecture design.
arXiv Detail & Related papers (2021-10-10T07:40:22Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.