Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders
- URL: http://arxiv.org/abs/2601.02324v1
- Date: Mon, 05 Jan 2026 18:15:53 GMT
- Title: Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders
- Authors: Alexander Roman, Emilie Panek, Roy T. Forestano, Eyup B. Unlu, Katia Matcheva, Konstantin T. Matchev,
- Abstract summary: This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures.<n>We use the Atmospheric Big Challenge (ABC) database to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2-poor atmospheres as the normal class.<n>We benchmarked four different anomaly detection strategies: Autoencoder Reconstruction Loss, One-Class Support Vector Machine (1 class-SVM), K-means Clustering, and Local Outlier Factor (LOF)
- Score: 35.61099185492068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a low-dimensional data representation. We use the Atmospheric Big Challenge (ABC) database, a publicly available dataset with over 100,000 simulated exoplanet spectra, to construct an anomaly detection scenario by defining CO2-rich atmospheres as anomalies and CO2-poor atmospheres as the normal class. We benchmarked four different anomaly detection strategies: Autoencoder Reconstruction Loss, One-Class Support Vector Machine (1 class-SVM), K-means Clustering, and Local Outlier Factor (LOF). Each method was evaluated in both the original spectral space and the autoencoder's latent space using Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) metrics. To test the performance of the different methods under realistic conditions, we introduced Gaussian noise levels ranging from 10 to 50 ppm. Our results indicate that anomaly detection is consistently more effective when performed within the latent space across all noise levels. Specifically, K-means clustering in the latent space emerged as a stable and high-performing method. We demonstrate that this anomaly detection approach is robust to noise levels up to 30 ppm (consistent with realistic space-based observations) and remains viable even at 50 ppm when leveraging latent space representations. On the other hand, the performance of the anomaly detection methods applied directly in the raw spectral space degrades significantly with increasing the level of noise. This suggests that autoencoder-driven dimensionality reduction offers a robust methodology for flagging chemically anomalous targets in large-scale surveys where exhaustive retrievals are computationally prohibitive.
Related papers
- Efficient reduction of stellar contamination and noise in planetary transmission spectra using neural networks [0.0]
We present a methodology to reduce stellar contamination and instrument-specific noise in exoplanet spectra using denoising autoencoders.<n>We design and train denoising autoencoder architectures on large synthetic datasets of terrestrial (TRAPPIST-1e analogues) and sub-Neptune (K2-18b analogues) planets.
arXiv Detail & Related papers (2026-02-10T22:07:18Z) - Hyperspectral Variational Autoencoders for Joint Data Compression and Component Extraction [5.157309015481045]
We present a variational autoencoder (VAE) approach that achieves x514 compression of NASA's TEMPO satellite hyperspectral observations (1028 channels, 290-490nm)<n>This dramatic data volume reduction enables efficient archival and sharing of satellite observations while preserving spectral fidelity.
arXiv Detail & Related papers (2025-11-23T16:26:09Z) - Isolation-based Spherical Ensemble Representations for Anomaly Detection [60.989157958972356]
Anomaly detection is a critical task in data mining and management with applications spanning fraud detection, network security, and log monitoring.<n>Existing unsupervised anomaly detection methods face fundamental challenges including conflicting distributional assumptions, computational inefficiency, and difficulty handling different anomaly types.<n>We propose ISER (Isolation-based Spherical Ensemble Representations) that extends existing isolation-based methods by using hypersphere radii as proxies for local density characteristics while maintaining linear time and constant space complexity.
arXiv Detail & Related papers (2025-10-15T09:00:05Z) - CLIP Meets Diffusion: A Synergistic Approach to Anomaly Detection [49.11819337853632]
Anomaly detection is a complex problem due to the ambiguity in defining anomalies, the diversity of anomaly types, and the scarcity of training data.<n>We propose CLIPfusion, a method that leverages both discriminative and generative foundation models.<n>We believe that our method underscores the effectiveness of multi-modal and multi-model fusion in tackling the multifaceted challenges of anomaly detection.
arXiv Detail & Related papers (2025-06-13T13:30:15Z) - Enhancing anomaly detection with topology-aware autoencoders [0.0]
Autoencoders provide a signal-agnostic approach but are limited by the topology of their latent space.<n>We construct autoencoders with spherical ($Sn$), product ($S2 otimes S2$), and projective ($mathbbRP2$) latent spaces.<n>Applying our approach to simulated hadronic top-quark decays, we show that latent spaces with appropriate topological constraints enhance sensitivity and robustness in detecting anomalous events.
arXiv Detail & Related papers (2025-02-14T13:50:46Z) - Adaptive Signal Analysis for Automated Subsurface Defect Detection Using Impact Echo in Concrete Slabs [0.0]
This pilot study presents a novel, automated, and scalable methodology for detecting subsurface defect-prone regions in concrete slabs.<n>The approach integrates advanced signal processing, clustering, and visual analytics to identify subsurface anomalies.<n>The results demonstrate the robustness of the methodology, consistently identifying defect-prone areas with minimal false positives and few missed defects.
arXiv Detail & Related papers (2024-12-23T20:05:53Z) - A Classifier-Based Approach to Multi-Class Anomaly Detection Applied to Astronomical Time-Series [0.0]
anomaly detection is an open problem in many scientific fields.
Most anomaly detection algorithms for astronomical time-series rely either on hand-crafted features or on features generated through unsupervised representation learning.
We introduce a novel approach that leverages the latent space of a neural network classifier for anomaly detection.
arXiv Detail & Related papers (2024-08-05T18:00:00Z) - A Classifier-Based Approach to Multi-Class Anomaly Detection for Astronomical Transients [0.0]
Real-time anomaly detection essential for identifying rare transients.<n>Modern survey telescopes generating tens of thousands of alerts per night.<n>Future telescopes, such as the Vera C. Rubin Observatory, projected to increase this number dramatically.
arXiv Detail & Related papers (2024-03-21T18:00:00Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Decision Forest Based EMG Signal Classification with Low Volume Dataset
Augmented with Random Variance Gaussian Noise [51.76329821186873]
We produce a model that can classify six different hand gestures with a limited number of samples that generalizes well to a wider audience.
We appeal to a set of more elementary methods such as the use of random bounds on a signal, but desire to show the power these methods can carry in an online setting.
arXiv Detail & Related papers (2022-06-29T23:22:18Z) - Self-Supervised Training with Autoencoders for Visual Anomaly Detection [61.62861063776813]
We focus on a specific use case in anomaly detection where the distribution of normal samples is supported by a lower-dimensional manifold.
We adapt a self-supervised learning regime that exploits discriminative information during training but focuses on the submanifold of normal examples.
We achieve a new state-of-the-art result on the MVTec AD dataset -- a challenging benchmark for visual anomaly detection in the manufacturing domain.
arXiv Detail & Related papers (2022-06-23T14:16:30Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.