AstroPT: Scaling Large Observation Models for Astronomy
- URL: http://arxiv.org/abs/2405.14930v1
- Date: Thu, 23 May 2024 18:00:00 GMT
- Title: AstroPT: Scaling Large Observation Models for Astronomy
- Authors: Michael J. Smith, Ryan J. Roberts, Eirini Angeloudi, Marc Huertas-Company,
- Abstract summary: We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models.
We believe that collaborative community development paves the best route towards realising an open source Large Observation Model'
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work presents AstroPT, an autoregressive pretrained transformer developed with astronomical use-cases in mind. The AstroPT models presented here have been pretrained on 8.6 million $512 \times 512$ pixel $grz$-band galaxy postage stamp observations from the DESI Legacy Survey DR8. We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models. We also find that the models' performances on downstream tasks as measured by linear probing improves with model size up to the model parameter saturation point. We believe that collaborative community development paves the best route towards realising an open source `Large Observation Model' -- a model trained on data taken from the observational sciences at the scale seen in natural language processing. To this end, we release the source code, weights, and dataset for AstroPT under the MIT license, and invite potential collaborators to join us in collectively building and researching these models.
Related papers
- OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery [0.3926357402982764]
Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities.
We pair high-performance computing resources including Frontier supercomputer, America's first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs.
arXiv Detail & Related papers (2024-10-25T20:55:12Z) - AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy [4.729846733874557]
This study aims to quantitatively assess specialized LLMs in astronomy.
We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model.
Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements.
arXiv Detail & Related papers (2024-09-29T16:02:22Z) - At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models [0.0]
Vision-Language multimodal Models (VLMs) offer the possibility for zero-shot classification in astronomy.
We investigate two models, GPT-4o and LLaVA-NeXT, for zero-shot classification of low-surface brightness galaxies and artifacts.
We show that with natural language prompts these models achieved significant accuracy (above 80 percent typically) without additional training/fine tuning.
arXiv Detail & Related papers (2024-06-24T18:17:54Z) - Uni-Mol2: Exploring Molecular Pretraining Model at Scale [27.172011090947823]
We present Uni-Mol2, an innovative molecular pretraining model that integrates features at the atomic level, graph level, and geometry structure level.
We successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date.
arXiv Detail & Related papers (2024-06-21T08:28:54Z) - The Scaling Law in Stellar Light Curves [3.090476527764192]
We investigate the scaling law properties that emerge when learning from astronomical time series data using self-supervised techniques.
A self-supervised Transformer model achieves 3-10 times the sample efficiency compared to the state-of-the-art supervised learning model.
Our research lays the groundwork for analyzing stellar light curves by examining them through large-scale auto-regressive generative models.
arXiv Detail & Related papers (2024-05-27T13:31:03Z) - Weak-to-Strong Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to boost models' alignment with human preference.
We demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models.
We shed light on the essence of ExPO amplifying the reward signal learned during alignment training.
arXiv Detail & Related papers (2024-04-25T17:39:50Z) - The Languini Kitchen: Enabling Language Modelling Research at Different
Scales of Compute [66.84421705029624]
We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours.
We pre-process an existing large, diverse, and high-quality dataset of books that surpasses existing academic benchmarks in quality, diversity, and document length.
This work also provides two baseline models: a feed-forward model derived from the GPT-2 architecture and a recurrent model in the form of a novel LSTM with ten-fold throughput.
arXiv Detail & Related papers (2023-09-20T10:31:17Z) - A Comparative Study on Generative Models for High Resolution Solar
Observation Imaging [59.372588316558826]
This work investigates capabilities of current state-of-the-art generative models to accurately capture the data distribution behind observed solar activity states.
Using distributed training on supercomputers, we are able to train generative models for up to 1024x1024 resolution that produce high quality samples indistinguishable to human experts.
arXiv Detail & Related papers (2023-04-14T14:40:32Z) - The effectiveness of MAE pre-pretraining for billion-scale pretraining [65.98338857597935]
We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model.
We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition.
arXiv Detail & Related papers (2023-03-23T17:56:12Z) - METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO)
The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z) - Exploring Sparse Expert Models and Beyond [51.90860155810848]
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost.
We propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing.
This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models.
arXiv Detail & Related papers (2021-05-31T16:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.