Related papers: XGen-7B Technical Report

XGen-7B Technical Report

URL: http://arxiv.org/abs/2309.03450v1
Date: Thu, 7 Sep 2023 02:20:03 GMT
Title: XGen-7B Technical Report
Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kry\'sci\'nski, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong
Abstract summary: XGen is a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We open-source our models for both research advancements and commercial applications.
Score: 138.71625147048377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.

Related papers

Catastrophic Forgetting in LLMs: A Comparative Analysis Across Language Tasks [0.0]
Large Language Models (LLMs) have significantly advanced Natural Language Processing (NLP) This study evaluates the continual fine-tuning of various open-source LLMs on key NLU tasks. Our results indicate that models such as Phi-3.5-mini exhibit minimal forgetting while maintaining strong learning capabilities.
arXiv Detail & Related papers (2025-04-01T23:06:55Z)
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model [80.93387166769679]
We present IXC-2.5-Reward, a simple yet effective multi-modal reward model that aligns Large Vision Language Models with human preferences. IXC-2.5-Reward achieves excellent results on the latest multi-modal reward model benchmark and shows competitive performance on text-only reward model benchmarks.
arXiv Detail & Related papers (2025-01-21T18:47:32Z)
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation [48.17611255751571]
Post-training is essential for enabling large language models to follow human instructions. We leverage multi-agent simulation to automatically generate diverse text-based scenarios. We introduce a novel scenario-driven instruction generator MATRIX-Gen for controllable and highly realistic data synthesis.
arXiv Detail & Related papers (2024-10-18T08:01:39Z)
xLAM: A Family of Large Action Models to Empower AI Agent Systems [111.5719694445345]
We release xLAM, a series of large action models designed for AI agent tasks. xLAM consistently delivers exceptional performance across multiple agent ability benchmarks.
arXiv Detail & Related papers (2024-09-05T03:22:22Z)
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models [157.44696790158784]
This report introduces xGen-MM, a framework for developing Large Multimodal Models (LMMs) The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. Our models undergo rigorous evaluation across a range of tasks, including both single and multi-image benchmarks.
arXiv Detail & Related papers (2024-08-16T17:57:01Z)
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities [53.97515452727115]
ChatQA 2 is a Llama 3.0-based model with a 128K context window. We present a training recipe to extend the context window of Llama3-70B-base from 8K to 128K tokens. Our results demonstrate that the Llama3-ChatQA-2-70B model outperforms most existing state-of-the-art models.
arXiv Detail & Related papers (2024-07-19T17:35:47Z)
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM models [8.02264001053969]
Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. With constant innovation in LLM serving optimizations and model architecture evolving at breakneck speed, the hardware requirements to meet Service Level Objectives (SLOs) remain an open research question. We present an analytical tool, GenZ, to efficiently navigate the relationship between diverse LLM model architectures and AI platform design parameters.
arXiv Detail & Related papers (2024-06-03T18:00:50Z)
Prompt2Model: Generating Deployable Models from Natural Language Instructions [74.19816829003729]
Large language models (LLMs) enable system builders to create competent NLP systems through prompting. In other ways, LLMs are a step backward from traditional special-purpose NLP models. We propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs.
arXiv Detail & Related papers (2023-08-23T17:28:21Z)
Augmenting Interpretable Models with LLMs during Training [73.40079895413861]
We propose Augmented Interpretable Models (Aug-imodels) to build efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions.
arXiv Detail & Related papers (2022-09-23T18:36:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.