Related papers: GPT-NeoX-20B: An Open-Source Autoregressive Language Model

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

URL: http://arxiv.org/abs/2204.06745v1
Date: Thu, 14 Apr 2022 04:00:27 GMT
Title: GPT-NeoX-20B: An Open-Source Autoregressive Language Model
Authors: Sid Black and Stella Biderman and Eric Hallahan and Quentin Anthony and Leo Gao and Laurence Golding and Horace He and Connor Leahy and Kyle McDonell and Jason Phang and Michael Pieler and USVSN Sai Prashanth and Shivanshu Purohit and Laria Reynolds and Jonathan Tow and Ben Wang and Samuel Weinbach
Abstract summary: GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile. Weights will be made freely and openly available to the public through a permissive license.
Score: 16.27825182552061
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge, the largest dense autoregressive model that has publicly available weights at the time of submission. In this work, we describe \model{}'s architecture and training and evaluate its performance on a range of language-understanding, mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source the training and evaluation code, as well as the model weights, at https://github.com/EleutherAI/gpt-neox.

Related papers

BgGPT 1.0: Extending English-centric LLMs to other languages [12.867025651644692]
We present BgGPT-Gemma-2-27B-Instruct and BgGPT-Gemma-2-9B-Instruct: continually pretrained and fine-tuned versions of Google's Gemma-2 models. Our models demonstrate strong performance in Bulgarian language tasks, setting a new standard for language-specific AI models.
arXiv Detail & Related papers (2024-12-14T16:49:52Z)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models [146.18107944503436]
Molmo is a new family of VLMs that are state-of-the-art in their class of openness. Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators. We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
arXiv Detail & Related papers (2024-09-25T17:59:51Z)
ChuXin: 1.6B Technical Report [7.03872473285061]
ChuXin is an entirely open-source language model with a size of 1.6 billion parameters. We have made everything needed to train a model available, including the training data, the training process, and the evaluation code.
arXiv Detail & Related papers (2024-05-08T05:54:44Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
What Language Model to Train if You Have One Million GPU Hours? [54.32062236748831]
We study the impact of different modeling practices and their impact on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. All our models and code are open-sourced at https://huggingface.co/bigscience.
arXiv Detail & Related papers (2022-10-27T13:43:27Z)
GLM-130B: An Open Bilingual Pre-trained Model [56.694470924635624]
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained.
arXiv Detail & Related papers (2022-10-05T17:34:44Z)
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks [77.90900650816046]
We introduce $textZemi$, a zero-shot semi-parametric language model. We train $textZemi$ with a novel semi-parametric multitask prompted training paradigm. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus.
arXiv Detail & Related papers (2022-10-01T04:08:50Z)
mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages. We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism. The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z)
A Systematic Evaluation of Large Language Models of Code [88.34057460577957]
Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. The current state-of-the-art code LMs are not publicly available, leaving many questions about their model and data design decisions. Although Codex is not open-source, we find that existing open-source models do achieve close results in some programming languages. We release a new model, PolyCoder, with 2.7B parameters based on the GPT-2 architecture, which was trained on 249GB of code across 12 programming languages on a single machine.
arXiv Detail & Related papers (2022-02-26T15:53:55Z)
bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model. bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z)
Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese [33.83704598544326]
Mengzi stands for a family of discriminative, generative, domain-specific, and multimodal pre-trained model variants. Compared with public Chinese PLMs, Mengzi is simple but more powerful. Our lightweight model has achieved new state-of-the-art results on the widely-used CLUE benchmark.
arXiv Detail & Related papers (2021-10-13T13:14:32Z)
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation [25.430130072811075]
We propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph.
arXiv Detail & Related papers (2021-07-05T16:54:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.