At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models
- URL: http://arxiv.org/abs/2406.17057v1
- Date: Mon, 24 Jun 2024 18:17:54 GMT
- Title: At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models
- Authors: Dimitrios Tanoglidis, Bhuvnesh Jain,
- Abstract summary: Vision-Language multimodal Models (VLMs) offer the possibility for zero-shot classification in astronomy.
We investigate two models, GPT-4o and LLaVA-NeXT, for zero-shot classification of low-surface brightness galaxies and artifacts.
We show that with natural language prompts these models achieved significant accuracy (above 80 percent typically) without additional training/fine tuning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision-Language multimodal Models (VLMs) offer the possibility for zero-shot classification in astronomy: i.e. classification via natural language prompts, with no training. We investigate two models, GPT-4o and LLaVA-NeXT, for zero-shot classification of low-surface brightness galaxies and artifacts, as well as morphological classification of galaxies. We show that with natural language prompts these models achieved significant accuracy (above 80 percent typically) without additional training/fine tuning. We discuss areas that require improvement, especially for LLaVA-NeXT, which is an open source model. Our findings aim to motivate the astronomical community to consider VLMs as a powerful tool for both research and pedagogy, with the prospect that future custom-built or fine-tuned models could perform better.
Related papers
- LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - Using Galaxy Evolution as Source of Physics-Based Ground Truth for Generative Models [0.9701233658865522]
We build a conditional denoising diffusionaxy probabilistic model (DDPM) and a conditional variational autoencoder (CVAE)
This is one of the first studies to probe these generative models using physically motivated metrics.
We find that both models produce comparable realistic galaxies based on human evaluation, but our physics-based metrics are better able to discern the strengths and weaknesses of the generative models.
arXiv Detail & Related papers (2024-07-09T21:01:08Z) - Large Language Model Pruning [0.0]
We suggest a model pruning technique specifically focused on LLMs.
The proposed methodology emphasizes the explainability of deep learning models.
We also explore the difference between pruning on large-scale models vs. pruning on small-scale models.
arXiv Detail & Related papers (2024-05-24T18:22:15Z) - AstroPT: Scaling Large Observation Models for Astronomy [0.0]
We train a selection of foundation models of increasing size from 1 million to 2.1 billion parameters, and find that AstroPT follows a similar saturating log-log scaling law to textual models.
We believe that collaborative community development paves the best route towards realising an open source Large Observation Model'
arXiv Detail & Related papers (2024-05-23T18:00:00Z) - YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters.
YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline.
The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z) - AstroLLaMA: Towards Specialized Foundation Models in Astronomy [1.1694367694169385]
We introduce AstroLLaMA, a 7-billion- parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv.
Our model generates more insightful and scientifically relevant text completions and embedding extraction than state-of-the-arts foundation models.
Its public release aims to spur astronomy-focused research, including automatic paper summarization and conversational agent development.
arXiv Detail & Related papers (2023-09-12T11:02:27Z) - A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades.
Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora.
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z) - What Language Model to Train if You Have One Million GPU Hours? [54.32062236748831]
We study the impact of different modeling practices and their impact on zero-shot generalization.
We also study the performance of a multilingual model and how it compares to the English-only one.
All our models and code are open-sourced at https://huggingface.co/bigscience.
arXiv Detail & Related papers (2022-10-27T13:43:27Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Ensembling Off-the-shelf Models for GAN Training [55.34705213104182]
We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators.
We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings.
Our method can improve GAN training in both limited data and large-scale settings.
arXiv Detail & Related papers (2021-12-16T18:59:50Z) - Morphological classification of astronomical images with limited
labelling [0.0]
We propose an effective semi-supervised approach for galaxy morphology classification task, based on active learning of adversarial autoencoder (AAE) model.
For a binary classification problem (top level question of Galaxy Zoo 2 decision tree) we achieved accuracy 93.1% on the test part with only 0.86 millions markup actions.
Our best model with additional markup accuracy of 95.5%.
arXiv Detail & Related papers (2021-04-27T19:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.