Related papers: OLMo: Accelerating the Science of Language Models

OLMo: Accelerating the Science of Language Models

URL: http://arxiv.org/abs/2402.00838v4
Date: Fri, 7 Jun 2024 21:59:52 GMT
Title: OLMo: Accelerating the Science of Language Models
Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi,
Abstract summary: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces. We believe it is essential for the research community to have access to powerful, truly open LMs. We have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models.
Score: 165.16277690540363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation.

Related papers

Open-Source Multimodal Moxin Models with Moxin-VLM and Moxin-VLA [53.68989489261506]
Moxin 7B is introduced as a fully open-source Large Language Models (LLMs)<n>We develop three variants based on Moxin, including Moxin-VLM, Moxin-VLA, and Moxin-Chinese.<n> Experiments show that our models achieve superior performance in various evaluations.
arXiv Detail & Related papers (2025-12-22T02:36:42Z)
Large language models in materials science and the need for open-source approaches [3.35950184561189]
Review examines recent large language models (LLMs) applications across the materials discovery pipeline.<n>We highlight how LLMs extract valuable information such as synthesis conditions from text.<n> benchmark results demonstrate that open-source alternatives can match performance while offering greater transparency, cost-effectiveness, and data privacy.<n>As open-source models continue to improve, we advocate their broader adoption to build accessible, flexible, and community-driven AI platforms for scientific discovery.
arXiv Detail & Related papers (2025-11-10T00:05:20Z)
Llama-Nemotron: Efficient Reasoning Models [105.18850667504097]
We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models.<n>The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B)
arXiv Detail & Related papers (2025-05-02T01:35:35Z)
Using (Not so) Large Language Models for Generating Simulation Models in a Formal DSL -- A Study on Reaction Networks [0.0]
We evaluate how a Large Language Model might be used for formalizing natural language into simulation models. We develop a synthetic data generator to serve as the basis for fine-tuning and evaluation. Our evaluation shows that our fine-tuned Mistral model can recover the ground truth simulation model in up to 84.5% of cases.
arXiv Detail & Related papers (2025-03-03T15:48:01Z)
7B Fully Open Source Moxin-LLM -- From Pretraining to GRPO-based Reinforcement Learning Enhancement [42.10844666788254]
Moxin 7B is a fully open-source Large Language Models (LLMs) developed adhering to principles of open science, open source, open data, and open access. We release the pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints. Experiments show that our models achieve superior performance in various evaluations such as zero-shot evaluation, few-shot evaluation, and CoT evaluation.
arXiv Detail & Related papers (2024-12-08T02:01:46Z)
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models [61.14336781917986]
We introduce OpenR, an open-source framework for enhancing the reasoning capabilities of large language models (LLMs) OpenR unifies data acquisition, reinforcement learning training, and non-autoregressive decoding into a cohesive software platform. Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning.
arXiv Detail & Related papers (2024-10-12T23:42:16Z)
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models [146.18107944503436]
Molmo is a new family of VLMs that are state-of-the-art in their class of openness. Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators. We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future.
arXiv Detail & Related papers (2024-09-25T17:59:51Z)
A Survey on Mixture of Experts [11.801185267119298]
The mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal overhead. MoE has emerged as an effective method for substantially scaling up model capacity with minimal overhead. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE.
arXiv Detail & Related papers (2024-06-26T16:34:33Z)
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series [86.31735321970481]
We open-source MAP-Neo, a bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs.
arXiv Detail & Related papers (2024-05-29T17:57:16Z)
OpenELM: An Efficient Language Model Family with Open Training and Inference Framework [26.741510071520658]
We release OpenELM, a state-of-the-art open language model. With a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo.
arXiv Detail & Related papers (2024-04-22T23:12:03Z)
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [139.69207791947738]
Dolma is a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices.
arXiv Detail & Related papers (2024-01-31T20:29:50Z)
The Quo Vadis of the Relationship between Language and Large Language Models [3.10770247120758]
Large Language Models (LLMs) have come to encourage the adoption of LLMs as scientific models of language. We identify the most important theoretical and empirical risks brought about by the adoption of scientific models that lack transparency. We conclude that, at their current stage of development, LLMs hardly offer any explanations for language.
arXiv Detail & Related papers (2023-10-17T10:54:24Z)
Learning from models beyond fine-tuning [78.20895343699658]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.