AstroLLaVA: towards the unification of astronomical data and natural language
- URL: http://arxiv.org/abs/2504.08583v1
- Date: Fri, 11 Apr 2025 14:36:31 GMT
- Title: AstroLLaVA: towards the unification of astronomical data and natural language
- Authors: Sharaf Zaman, Michael J. Smith, Pranav Khetarpal, Rishabh Chakrabarty, Michele Ginolfi, Marc Huertas-Company, Maja Jabłońska, Sandor Kruk, Matthieu Le Lain, Sergio José Rodríguez Méndez, Dimitrios Tanoglidis,
- Abstract summary: We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue.<n>Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain.<n>We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present AstroLLaVA, a vision language model for astronomy that enables interaction with astronomical imagery through natural dialogue. By fine-tuning the LLaVA model on a diverse dataset of $\sim$30k images with captions and question-answer pairs sourced from NASA's `Astronomy Picture of the Day', the European Southern Observatory, and the NASA/ESA Hubble Space Telescope, we create a model capable of answering open-ended questions about astronomical concepts depicted visually. Our two-stage fine-tuning process adapts the model to both image captioning and visual question answering in the astronomy domain. We demonstrate AstroLLaVA's performance on an astronomical visual question answering benchmark and release the model weights, code, and training set to encourage further open source work in this space. Finally, we suggest a roadmap towards general astronomical data alignment with pre-trained language models, and provide an open space for collaboration towards this end for interested researchers.
Related papers
- At First Sight: Zero-Shot Classification of Astronomical Images with Large Multimodal Models [0.0]
Vision-Language multimodal Models (VLMs) offer the possibility for zero-shot classification in astronomy.
We investigate two models, GPT-4o and LLaVA-NeXT, for zero-shot classification of low-surface brightness galaxies and artifacts.
We show that with natural language prompts these models achieved significant accuracy (above 80 percent typically) without additional training/fine tuning.
arXiv Detail & Related papers (2024-06-24T18:17:54Z) - A Foundation Model for the Earth System [82.73624748093333]
We introduce Aurora, a large-scale foundation model for the Earth system trained on over a million hours of diverse data.
Aurora outperforms operational forecasts for air quality, ocean waves, tropical cyclone tracks, and high-resolution weather forecasting at orders of magnitude smaller computational expense than dedicated existing systems.
arXiv Detail & Related papers (2024-05-20T14:45:18Z) - PAPERCLIP: Associating Astronomical Observations and Natural Language with Multi-Modal Models [0.3840425533789961]
We present a method which associates astronomical observations imaged by telescopes with natural language using a neural network model.
The model is fine-tuned from a pre-trained Contrastive Language-Image Pre-training (CLIP) model.
Using observations from the Hubble Space Telescope (HST) as an example, we show that the fine-tuned model embodies a meaningful joint representation between observations and natural language.
arXiv Detail & Related papers (2024-03-13T18:00:00Z) - Astronomical Images Quality Assessment with Automated Machine Learning [0.0]
Electronically Assisted Astronomy consists of capturing deep sky images with a digital camera coupled to a telescope to display views of celestial objects that would have been invisible through direct observation.
This practice generates a large quantity of data, which may then be enhanced with dedicated image editing software after observation sessions.
In this study, we show how Image Quality Assessment can be useful for automatically rating astronomical images, and we also develop a dedicated model by using Automated Machine Learning.
arXiv Detail & Related papers (2023-11-17T16:14:11Z) - GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot
Attention for Vision-and-Language Navigation [52.65506307440127]
We propose GeoVLN, which learns Geometry-enhanced visual representation based on slot attention for robust Visual-and-Language Navigation.
We employ V&L BERT to learn a cross-modal representation that incorporate both language and vision informations.
arXiv Detail & Related papers (2023-05-26T17:15:22Z) - A Comparative Study on Generative Models for High Resolution Solar
Observation Imaging [59.372588316558826]
This work investigates capabilities of current state-of-the-art generative models to accurately capture the data distribution behind observed solar activity states.
Using distributed training on supercomputers, we are able to train generative models for up to 1024x1024 resolution that produce high quality samples indistinguishable to human experts.
arXiv Detail & Related papers (2023-04-14T14:40:32Z) - Improving astroBERT using Semantic Textual Similarity [0.785116730789274]
We introduce astroBERT, a machine learning language model tailored to the text used in astronomy papers in NASA's Astrophysics Data System (ADS)
We show how astroBERT improves over existing public language models on astrophysics specific tasks.
We detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context to further improve astroBERT.
arXiv Detail & Related papers (2022-11-29T16:15:32Z) - Things not Written in Text: Exploring Spatial Commonsense from Visual
Signals [77.46233234061758]
We investigate whether models with visual signals learn more spatial commonsense than text-based models.
We propose a benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions.
We find that image synthesis models are more capable of learning accurate and consistent spatial knowledge than other models.
arXiv Detail & Related papers (2022-03-15T17:02:30Z) - Partial-Attribution Instance Segmentation for Astronomical Source
Detection and Deblending [0.24920602678297968]
We introduce a new approach called Partial-Attribution Instances that enables source detection and deblending in a manner tractable for deep learning models.
We provide a novel neural network implementation as a demonstration of the method.
arXiv Detail & Related papers (2022-01-12T21:59:13Z) - Processing Images from Multiple IACTs in the TAIGA Experiment with
Convolutional Neural Networks [62.997667081978825]
We use convolutional neural networks (CNNs) to analyze Monte Carlo-simulated images from the TAIGA experiment.
The analysis includes selection of the images corresponding to the showers caused by gamma rays and estimating the energy of the gamma rays.
arXiv Detail & Related papers (2021-12-31T10:49:11Z) - First Full-Event Reconstruction from Imaging Atmospheric Cherenkov
Telescope Real Data with Deep Learning [55.41644538483948]
The Cherenkov Telescope Array is the future of ground-based gamma-ray astronomy.
Its first prototype telescope built on-site, the Large Size Telescope 1, is currently under commissioning and taking its first scientific data.
We present for the first time the development of a full-event reconstruction based on deep convolutional neural networks and its application to real data.
arXiv Detail & Related papers (2021-05-31T12:51:42Z) - Geometry-Guided Street-View Panorama Synthesis from Satellite Imagery [80.6282101835164]
We present a new approach for synthesizing a novel street-view panorama given an overhead satellite image.
Our method generates a Google's omnidirectional street-view type panorama, as if it is captured from the same geographical location as the center of the satellite patch.
arXiv Detail & Related papers (2021-03-02T10:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.