Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning
- URL: http://arxiv.org/abs/2510.17959v2
- Date: Mon, 10 Nov 2025 16:51:19 GMT
- Title: Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning
- Authors: Jeff Shen, Francois Lanusse, Liam Holden Parker, Ollie Liu, Tom Hehir, Leopoldo Sarra, Lucas Meyer, Micah Bowles, Sebastian Wagner-Carena, Sebastian Wagner-Carena, Helen Qu, Siavash Golkar, Alberto Bietti, Hatim Bourfoune, Nathan Cassereau, Pierre Cornette, Keiya Hirashima, Geraud Krawezik, Ruben Ohana, Nicholas Lourie, Michael McCabe, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Bruno Régaldo-Saint Blancard, Kyunghyun Cho, Miles Cranmer, Shirley Ho,
- Abstract summary: Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences.<n>We present a deep learning model that jointly learns from heterogeneous spectra in a self-supervised manner.<n>For the first time, we demonstrate that a single model can unify spectral data across resolutions and domains.
- Score: 39.14992490784682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential scientific data span many resolutions and domains, and unifying them into a common representation is a key step toward developing foundation models for the sciences. Astronomical spectra exemplify this challenge: massive surveys have collected millions of spectra across a wide range of wavelengths and resolutions, yet analyses remain fragmented across spectral domains (e.g., optical vs. infrared) and object types (e.g., stars vs. galaxies), limiting the ability to pool information across datasets. We present a deep learning model that jointly learns from heterogeneous spectra in a self-supervised manner. Our universal spectral tokenizer processes spectra from a variety of object types and resolutions directly on their native wavelength grids, producing intrinsically aligned, homogeneous, and physically meaningful representations that can be efficiently adapted to achieve competitive performance across a range of downstream tasks. For the first time, we demonstrate that a single model can unify spectral data across resolutions and domains, suggesting that our model can serve as a powerful building block for foundation models in astronomy -- and potentially extend to other scientific domains with heterogeneous sequential data, such as climate and healthcare.
Related papers
- Augmenting representations with scientific papers [0.820984376071696]
Astronomers have acquired vast repositories of multimodal data, including images, spectra, and time series.<n>These data sources are rarely systematically integrated.<n>This work introduces a contrastive learning framework designed to align X-ray spectra with domain knowledge extracted from scientific literature.
arXiv Detail & Related papers (2026-03-04T19:04:45Z) - OmniSpectra: A Unified Foundation Model for Native Resolution Astronomical Spectra [4.254099382808598]
We present OmniSpectra, the first native-resolution foundation model for astronomy spectra.<n>Unlike traditional models, which are limited to fixed-length input sizes or configurations, OmniSpectra handles spectra of any length at their original size.<n>This transfer learning capability makes this model the state-of-the-art across various astronomy tasks, including source classification, redshift estimation, and properties prediction for stars and galaxies.
arXiv Detail & Related papers (2026-01-21T04:39:32Z) - DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models [68.19129717255053]
We present DiffSpectra, a generative framework that formulates molecular structure elucidation as a conditional generation process.<n>Our experiments demonstrate that DiffSpectra accurately elucidates molecular structures, achieving 40.76% top-1 and 99.49% top-10 accuracy.
arXiv Detail & Related papers (2025-07-09T13:57:20Z) - SpecCLIP: Aligning and Translating Spectroscopic Measurements for Stars [1.4217538206528657]
We present SpecCLIP, a foundation model framework that extends LLM-inspired methodologies to stellar spectral analysis.<n>By training foundation models on large-scale spectral datasets, our goal is to learn robust and informative embeddings that support diverse downstream applications.<n>We demonstrate that fine-tuning these models on moderate-sized labeled datasets improves adaptability to tasks such as stellar- parameter estimation and chemical-abundance determination.
arXiv Detail & Related papers (2025-07-02T17:49:52Z) - CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis [69.02751635551724]
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding.<n> variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies.<n>We introduce CARL, a model for Camera-Agnostic Representation Learning across RGB, multispectral, and hyperspectral imaging modalities.
arXiv Detail & Related papers (2025-04-27T13:06:40Z) - Shared Stochastic Gaussian Process Latent Variable Models: A Multi-modal Generative Model for Quasar Spectra [2.3099448395832956]
We focus on an application in astrophysics where data sets typically contain both observed spectral features and scientific properties of astrophysical objects such as galaxies or exoplanets.<n>We study the spectra of very luminous galaxies known as quasars, along with their properties, in multiple observation spaces.<n>A single data point is then characterized by different classes of observations, each with different likelihoods.
arXiv Detail & Related papers (2025-02-27T06:57:23Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - SpectralGPT: Spectral Remote Sensing Foundation Model [60.023956954916414]
A universal RS foundation model, named SpectralGPT, is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT)
Compared to existing foundation models, SpectralGPT accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data.
Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience.
arXiv Detail & Related papers (2023-11-13T07:09:30Z) - HoloNets: Spectral Convolutions do extend to Directed Graphs [59.851175771106625]
Conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs.
Here we show this traditional reliance on the graph Fourier transform to be superfluous.
We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based.
arXiv Detail & Related papers (2023-10-03T17:42:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.