XGen-7B Technical Report
- URL: http://arxiv.org/abs/2309.03450v1
- Date: Thu, 7 Sep 2023 02:20:03 GMT
- Title: XGen-7B Technical Report
- Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen
Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil
Purushwalkam, Tong Niu, Wojciech Kry\'sci\'nski, Lidiya Murakhovs'ka,
Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat,
Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
- Abstract summary: XGen is a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens.
We open-source our models for both research advancements and commercial applications.
- Score: 138.71625147048377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have become ubiquitous across various domains,
transforming the way we interact with information and conduct research.
However, most high-performing LLMs remain confined behind proprietary walls,
hindering scientific progress. Most open-source LLMs, on the other hand, are
limited in their ability to support longer sequence lengths, which is a key
requirement for many tasks that require inference over an input context. To
address this, we have trained XGen, a series of 7B parameter models on up to 8K
sequence length for up to 1.5T tokens. We have also finetuned the XGen models
on public-domain instructional data, creating their instruction-tuned
counterparts (XGen-Inst). We open-source our models for both research
advancements and commercial applications. Our evaluation on standard benchmarks
shows that XGen models achieve comparable or better results when compared with
state-of-the-art open-source LLMs. Our targeted evaluation on long sequence
modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence
open-source LLMs.
Related papers
- Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [48.17611255751571]
Post-training is essential for enabling large language models to follow human instructions.
We leverage multi-agent simulation to automatically generate diverse text-based scenarios.
We introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis.
arXiv Detail & Related papers (2024-10-18T08:01:39Z) - xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks.
xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z) - xGen-MM (BLIP-3): A Family of Open Large Multimodal Models [157.44696790158784]
This report introduces xGen-MM, a framework for developing Large Multimodal Models (LMMs)
The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs.
Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks.
arXiv Detail & Related papers (2024-08-16T17:57:01Z) - ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window.
We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens.
Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-19T17:35:47Z) - Prompt2Model: Generating Deployable Models from Natural Language
Instructions [74.19816829003729]
Large language models (LLMs) enable system builders to create competent NLP systems through prompting.
In other ways, LLMs are a step backward from traditional special-purpose NLP models.
We propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs.
arXiv Detail & Related papers (2023-08-23T17:28:21Z) - Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models.
Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency.
We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.