amc: The Automated Mission Classifier for Telescope Bibliographies
- URL: http://arxiv.org/abs/2512.11202v1
- Date: Fri, 12 Dec 2025 01:24:42 GMT
- Title: amc: The Automated Mission Classifier for Telescope Bibliographies
- Authors: John F. Wu, Joshua E. G. Peek, Sophie J. Miller, Jenny Novacescu, Achu J. Usha, Christopher A. Wilkinson,
- Abstract summary: A modified version of amc performs well on the TRACS Kaggle challenge.<n>Amc can also be used to interrogate historical datasets and surface potential label errors.
- Score: 0.10262304700896198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Telescope bibliographies record the pulse of astronomy research by capturing publication statistics and citation metrics for telescope facilities. Robust and scalable bibliographies ensure that we can measure the scientific impact of our facilities and archives. However, the growing rate of publications threatens to outpace our ability to manually label astronomical literature. We therefore present the Automated Mission Classifier (amc), a tool that uses large language models (LLMs) to identify and categorize telescope references by processing large quantities of paper text. A modified version of amc performs well on the TRACS Kaggle challenge, achieving a macro $F_1$ score of 0.84 on the held-out test set. amc is valuable for other telescopes beyond TRACS; we developed the initial software for identifying papers that featured scientific results by NASA missions. Additionally, we investigate how amc can also be used to interrogate historical datasets and surface potential label errors. Our work demonstrates that LLM-based applications offer powerful and scalable assistance for library sciences.
Related papers
- Encoder Fine-tuning with Stochastic Sampling Outperforms Open-weight GPT in Astronomy Knowledge Extraction [11.478263835391433]
We present an encoder-based system for extracting knowledge from astronomy articles.<n>Our system, despite its simplicity and low-cost implementation, significantly outperforms the open-weight GPT baseline.
arXiv Detail & Related papers (2025-11-11T13:08:30Z) - AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy [39.94582666929051]
We introduce AstroVisBench, the first benchmark for both scientific computing and visualization in the astronomy domain.<n>We present an evaluation of state-of-the-art language models, showing a significant gap in their ability to engage in astronomy research as useful assistants.
arXiv Detail & Related papers (2025-05-26T21:49:18Z) - Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers.<n>We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers.<n>Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z) - SciAgent: Tool-augmented Language Models for Scientific Reasoning [129.51442677710452]
We introduce a new task setting named tool-augmented scientific reasoning.
This setting supplements Large Language Models with scalable toolsets.
We construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools.
Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving.
arXiv Detail & Related papers (2024-02-18T04:19:44Z) - The Semantic Scholar Open Data Platform [92.2948743167744]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.<n>We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.<n>The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Building astroBERT, a language model for Astronomy & Astrophysics [1.4587241287997816]
We are applying modern machine learning and natural language processing techniques to NASA Astrophysics Data System (ADS) dataset.
We are training astroBERT, a deeply contextual language model based on research at Google.
Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool.
arXiv Detail & Related papers (2021-12-01T16:01:46Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - First Full-Event Reconstruction from Imaging Atmospheric Cherenkov
Telescope Real Data with Deep Learning [55.41644538483948]
The Cherenkov Telescope Array is the future of ground-based gamma-ray astronomy.
Its first prototype telescope built on-site, the Large Size Telescope 1, is currently under commissioning and taking its first scientific data.
We present for the first time the development of a full-event reconstruction based on deep convolutional neural networks and its application to real data.
arXiv Detail & Related papers (2021-05-31T12:51:42Z) - Self-Supervised Representation Learning for Astronomical Images [1.0499611180329804]
Self-supervised learning recovers representations of sky survey images that are semantically useful.
We show that our approach can achieve the accuracy of supervised models while using 2-4 times fewer labels for training.
arXiv Detail & Related papers (2020-12-24T03:25:36Z) - DeepShadows: Separating Low Surface Brightness Galaxies from Artifacts
using Deep Learning [70.80563014913676]
We investigate the use of convolutional neural networks (CNNs) for the problem of separating low-surface-brightness galaxies from artifacts in survey images.
We show that CNNs offer a very promising path in the quest to study the low-surface-brightness universe.
arXiv Detail & Related papers (2020-11-24T22:51:08Z) - Smart obervation method with wide field small aperture telescopes for
real time transient detection [8.751383520994425]
We propose ARGUS (Astronomical taRGets detection framework for Unified telescopes) for real-time transit detection.
The ARGUS uses a deep learning based astronomical detection algorithm implemented in embedded devices in each WFSATs to detect astronomical targets.
We use simulated data to test the performance of ARGUS and find that ARGUS can increase the performance of WFSATs in transient detection tasks robustly.
arXiv Detail & Related papers (2020-11-20T13:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.