OLMo: Accelerating the Science of Language Models
- URL: http://arxiv.org/abs/2402.00838v4
- Date: Fri, 7 Jun 2024 21:59:52 GMT
- Title: OLMo: Accelerating the Science of Language Models
- Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi,
- Abstract summary: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings.
As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces.
We believe it is essential for the research community to have access to powerful, truly open LMs.
We have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models.
- Score: 165.16277690540363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.
Related papers
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models [61.14336781917986]
We introduce OpenR, an open-source framework for enhancing the reasoning capabilities of large language models (LLMs)
OpenR unifies data acquisition, reinforcement learning training, and non-autoregressive decoding into a cohesive software platform.
Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning.
arXiv Detail & Related papers (2024-10-12T23:42:16Z) - Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models [146.18107944503436]
Molmo is a new family of VLMs that are state-of-the-art in their class of openness.
Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators.
We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
arXiv Detail & Related papers (2024-09-25T17:59:51Z) - A Survey on Mixture of Experts [11.801185267119298]
The mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal overhead.
MoE has emerged as an effective method for substantially scaling up model capacity with minimal overhead.
This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE.
arXiv Detail & Related papers (2024-06-26T16:34:33Z) - MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens.
Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z) - OpenELM: An Efficient Language Model Family with Open Training and Inference Framework [26.741510071520658]
We release OpenELM, a state-of-the-art open language model.
With a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo.
arXiv Detail & Related papers (2024-04-22T23:12:03Z) - Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [139.69207791947738]
Dolma is a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials.
We document Dolma, including its design principles, details about its construction, and a summary of its contents.
We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices.
arXiv Detail & Related papers (2024-01-31T20:29:50Z) - The Quo Vadis of the Relationship between Language and Large Language
Models [3.10770247120758]
Large Language Models (LLMs) have come to encourage the adoption of LLMs as scientific models of language.
We identify the most important theoretical and empirical risks brought about by the adoption of scientific models that lack transparency.
We conclude that, at their current stage of development, LLMs hardly offer any explanations for language.
arXiv Detail & Related papers (2023-10-17T10:54:24Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.