XGen-7B Technical Report
- URL: http://arxiv.org/abs/2309.03450v1
- Date: Thu, 7 Sep 2023 02:20:03 GMT
- Title: XGen-7B Technical Report
- Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen
Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil
Purushwalkam, Tong Niu, Wojciech Kry\'sci\'nski, Lidiya Murakhovs'ka,
Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat,
Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
- Abstract summary: XGen is a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens.
We open-source our models for both research advancements and commercial applications.
- Score: 138.71625147048377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have become ubiquitous across various domains,
transforming the way we interact with information and conduct research.
However, most high-performing LLMs remain confined behind proprietary walls,
hindering scientific progress. Most open-source LLMs, on the other hand, are
limited in their ability to support longer sequence lengths, which is a key
requirement for many tasks that require inference over an input context. To
address this, we have trained XGen, a series of 7B parameter models on up to 8K
sequence length for up to 1.5T tokens. We have also finetuned the XGen models
on public-domain instructional data, creating their instruction-tuned
counterparts (XGen-Inst). We open-source our models for both research
advancements and commercial applications. Our evaluation on standard benchmarks
shows that XGen models achieve comparable or better results when compared with
state-of-the-art open-source LLMs. Our targeted evaluation on long sequence
modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence
open-source LLMs.
Related papers
- MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens.
Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z) - Generative Representational Instruction Tuning [89.76840377003178]
GritLM 7B sets a new state of the art on the Massive Text Embedding Benchmark (MTEB)
GritLM 8x7B outperforms all open generative language models that we tried while still being among the best embedding models.
arXiv Detail & Related papers (2024-02-15T12:12:19Z) - DeepSeek LLM: Scaling Open-Source Language Models with Longtermism [76.90033862238728]
We present our findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B.
Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective.
arXiv Detail & Related papers (2024-01-05T18:59:13Z) - Herd: Using multiple, smaller LLMs to match the performances of
proprietary, large LLMs via an intelligent composer [1.0878040851637998]
We show that a herd of open source models can match or exceed the performance of proprietary models via an intelligent router.
In cases where GPT is not able to answer the query, Herd is able to identify a model that can, at least 40% of the time.
arXiv Detail & Related papers (2023-10-30T18:11:02Z) - Prompt2Model: Generating Deployable Models from Natural Language
Instructions [74.19816829003729]
Large language models (LLMs) enable system builders to create competent NLP systems through prompting.
In other ways, LLMs are a step backward from traditional special-purpose NLP models.
We propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs.
arXiv Detail & Related papers (2023-08-23T17:28:21Z) - Evaluating Instruction-Tuned Large Language Models on Code Comprehension
and Generation [4.310519298899164]
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension and generation tasks.
For the zero-shot setting, instructed LLMs are very competitive on code comprehension and generation tasks.
For the few-shot setting, we find that adding demonstration examples substantially helps instructed LLMs perform better.
arXiv Detail & Related papers (2023-08-02T15:54:22Z) - Empower Your Model with Longer and Better Context Comprehension [15.377707808279908]
We investigate the nature of information transfer within Large Language Models (LLMs)
We propose a novel technique called Attention Transition to empower models to achieve longer and better context comprehension.
Our experiments are conducted on the challenging XSum dataset using LLaMa-7b model with context token length ranging from 800 to 1900.
arXiv Detail & Related papers (2023-07-25T09:34:42Z) - Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models.
Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency.
We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.