Textual interpretation of transient image classifications from large language models
- URL: http://arxiv.org/abs/2510.06931v1
- Date: Wed, 08 Oct 2025 12:12:46 GMT
- Title: Textual interpretation of transient image classifications from large language models
- Authors: Fiorenzo Stoppa, Turan Bulmus, Steven Bloemen, Stephen J. Smartt, Paul J. Groot, Paul Vreeswijk, Ken W. Smith,
- Abstract summary: Large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets.<n>Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern astronomical surveys deliver immense volumes of transient detections, yet distinguishing real astrophysical signals (for example, explosive events) from bogus imaging artefacts remains a challenge. Convolutional neural networks are effectively used for real versus bogus classification; however, their reliance on opaque latent representations hinders interpretability. Here we show that large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets (Pan-STARRS, MeerLICHT and ATLAS) while simultaneously producing direct, human-readable descriptions for every candidate. Using only 15 examples and concise instructions, Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales. We also show that a second LLM can assess the coherence of the output of the first model, enabling iterative refinement by identifying problematic cases. This framework allows users to define the desired classification behaviour through natural language and examples, bypassing traditional training pipelines. Furthermore, by generating textual descriptions of observed features, LLMs enable users to query classifications as if navigating an annotated catalogue, rather than deciphering abstract latent spaces. As next-generation telescopes and surveys further increase the amount of data available, LLM-based classification could help bridge the gap between automated detection and transparent, human-level understanding.
Related papers
- Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing [0.20999222360659608]
We consider the problem of distinguishing human-written creative fiction (excerpts from novels) from similar text generated by an LLM.<n>Our results show that, while human observers perform poorly (near chance levels) on this binary classification task, a variety of machine-learning models achieve accuracy in the range 0.93 - 0.98.
arXiv Detail & Related papers (2026-01-12T09:50:15Z) - ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models [82.04858317800097]
We present ForenX, a novel method that not only identifies the authenticity of images but also provides explanations that resonate with human thoughts.<n>ForenX employs the powerful multimodal large language models (MLLMs) to analyze and interpret forensic cues.<n>We introduce ForgReason, a dataset dedicated to descriptions of forgery evidences in AI-generated images.
arXiv Detail & Related papers (2025-08-02T15:21:26Z) - Beyond the Visible: Multispectral Vision-Language Learning for Earth Observation [3.4719449211802456]
We introduce Llama3-MS-CLIP, the first vision-language model pre-trained with contrastive learning on a large-scale multispectral dataset.<n>We present the largest-to-date image-caption dataset for multispectral data, consisting of one million Sentinel-2 samples.<n>We evaluate Llama3-MS-CLIP on multispectral zero-shot image classification and retrieval using three datasets of varying complexity.
arXiv Detail & Related papers (2025-03-20T09:13:31Z) - LED: LLM Enhanced Open-Vocabulary Object Detection without Human Curated Data Generation [52.58791563814837]
Large foundation models trained on large-scale vision-language data can boost Open-Vocabulary Object Detection (OVD)<n>This paper presents a systematic method to enhance visual grounding by utilizing decoder layers of the Large Language Models (LLMs)<n>We find that intermediate LLM layers already encode rich spatial semantics; adapting only the early layers yields most of the gain.
arXiv Detail & Related papers (2025-03-18T00:50:40Z) - SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection [16.89965584177711]
Recent open-vocabulary human-object interaction (OV-HOI) detection methods rely on large language model (LLM) for generating auxiliary descriptions and leverage knowledge distilled from CLIP to detect unseen interaction categories.<n>Despite their effectiveness, these methods face two challenges: (1) feature granularity deficiency, due to reliance on last layer visual features for text alignment, leading to the neglect of crucial object-level details from intermediate layers; (2) semantic similarity confusion, resulting from CLIP's inherent biases toward certain classes, while LLM-generated descriptions based solely on labels fail to adequately capture inter-class similarities.
arXiv Detail & Related papers (2025-03-01T09:26:05Z) - Language Driven Occupancy Prediction [13.35971455725581]
We introduce LOcc, an effective and generalizable framework for open-vocabulary occupancy prediction.<n>Our pipeline presents a feasible way to dig into the valuable semantic information of images, transferring text labels from images to LiDAR point clouds and ultimately to voxels.<n>By replacing the original prediction head of supervised occupancy models with a geometry head for binary occupancy states and a language head for language features, LOcc effectively uses the generated language ground truth to guide the learning of 3D language volume.
arXiv Detail & Related papers (2024-11-25T03:47:10Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Large Language Models Understand Layout [6.732578061359833]
Large language models (LLMs) demonstrate extraordinary abilities in a wide range of natural language processing (NLP) tasks.
We show that, beyond text understanding capability, LLMs are capable of processing text layouts denoted by spatial markers.
We show that layout understanding ability is beneficial for building efficient visual question-answering (VQA) systems.
arXiv Detail & Related papers (2024-07-08T09:03:12Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification [59.99976102069976]
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data.<n>Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning.<n>This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories.
arXiv Detail & Related papers (2024-03-13T05:48:58Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - A generic self-supervised learning (SSL) framework for representation
learning from spectra-spatial feature of unlabeled remote sensing imagery [4.397725469518669]
Self-supervised learning (SSL) enables the models to learn a representation from orders of magnitude more unlabelled data.
This work has designed a novel SSL framework that is capable of learning representation from both spectra-spatial information of unlabelled data.
arXiv Detail & Related papers (2023-06-27T23:50:43Z) - GSMFlow: Generation Shifts Mitigating Flow for Generalized Zero-Shot
Learning [55.79997930181418]
Generalized Zero-Shot Learning aims to recognize images from both the seen and unseen classes by transferring semantic knowledge from seen to unseen classes.
It is a promising solution to take the advantage of generative models to hallucinate realistic unseen samples based on the knowledge learned from the seen classes.
We propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
arXiv Detail & Related papers (2022-07-05T04:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.