Related papers: CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena

CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena

URL: http://arxiv.org/abs/2510.21022v1
Date: Thu, 23 Oct 2025 22:11:29 GMT
Title: CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena
Authors: Jasmine R. Kobayashi, Daniela Martin, Valmir P Moraes Filho, Connor O'Brien, Jinsu Hong, Sudeshna Boro Saikia, Hala Lamdouar, Nathan D. Miles, Marcella Scoczynski, Mavis Stone, Sairam Sundaresan, Anna Jungbluth, Andrés Muñoz-Jaramillo, Evangelia Samara, Joseph Gallego,
Abstract summary: We present the textitClustering and Indexation Pipeline with Human Evaluation for Recognition (CIPHER)<n> CIPHER is a framework designed to accelerate large-scale labeling of complex time series in physics.<n>We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research.
Score: 3.0717901664567857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining symbolic representations, unsupervised learning, and expert knowledge to address label scarcity in time series across the physical sciences. The code and configuration files used in this study are publicly available to support reproducibility.

Related papers

Spatio-temporal Decoupled Knowledge Compensator for Few-Shot Action Recognition [92.22104713961431]
Few-Shot Action Recognition (FSAR) is a challenging task that requires recognizing novel action categories with a few labeled videos.<n>Recent works typically apply semantically coarse category names as auxiliary contexts to guide the learning of discriminative visual features.<n>We propose DiST, an innovative De-incorporation framework for FSAR that makes use of decoupled Spatial knowledge.
arXiv Detail & Related papers (2026-02-20T07:52:57Z)
Opportunities in AI/ML for the Rubin LSST Dark Energy Science Collaboration [63.61423859450929]
This white paper surveys the current landscape of AI/ML across DESC's primary cosmological probes and cross-cutting analyses.<n>We identify key methodological research priorities, including Bayesian inference at scale, physics-informed methods, validation frameworks, and active learning for discovery.
arXiv Detail & Related papers (2026-01-20T18:46:42Z)
SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
SigTime: Learning and Visually Explaining Time Series Signatures [22.200677868580204]
We introduce a novel learning framework that jointly trains two Transformer models using complementary time series representations.<n>The learned shapelets serve as interpretable signatures that differentiate time series across classification labels.<n>We develop a visual analytics system -- SigTIme -- with coordinated views to facilitate exploration of time series signatures.
arXiv Detail & Related papers (2025-12-12T22:47:34Z)
Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification [0.005439329219803859]
Event time series are sequences of discrete events occurring at irregular time intervals.<n>They are common in domains such as high-energy astrophysics, computational social science, cybersecurity, finance, healthcare, neuroscience, and seismology.<n>We propose novel two- and three-dimensional tensor representations for event time series, coupled with sparse autoencoders that learn physically meaningful latent representations.
arXiv Detail & Related papers (2025-07-15T18:01:03Z)
CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations [52.251569042852815]
CRIA is an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets.<n>The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively.<n> Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions.
arXiv Detail & Related papers (2025-06-19T06:31:08Z)
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [60.83579255387347]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z)
TopoCL: Topological Contrastive Learning for Time Series [1.8434042562191815]
We propose Topological Contrastive Learning for time series (TopoCL)<n>TopoCL mitigates information loss by incorporating persistent homology.<n>We conduct experiments on four downstream tasks-classification, anomaly detection, forecasting, and transfer learning.
arXiv Detail & Related papers (2025-02-05T06:37:35Z)
Deep Temporal Graph Clustering [77.02070768950145]
We propose a general framework for deep Temporal Graph Clustering (GC) GC introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs. Our framework can effectively improve the performance of existing temporal graph learning methods.
arXiv Detail & Related papers (2023-05-18T06:17:50Z)
Representation Learning for Person or Entity-centric Knowledge Graphs: An Application in Healthcare [0.757843972001219]
This paper presents an end-to-end representation learning framework to extract entity-centric KGs from structured and unstructured data. We introduce a star-shaped classifier to represent the multiple facets of a person and use it to guide KG creation. We highlight that this approach has several potential applications across domains and is open-sourced.
arXiv Detail & Related papers (2023-05-09T17:39:45Z)
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation [124.74256749281625]
We introduce a text diversification strategy that generates a set of synonyms for each training category. We also employ a text-guided knowledge distillation method to preserve the generalizable knowledge of CLIP. Our proposed model achieves robust generalization performance across various datasets.
arXiv Detail & Related papers (2023-03-16T09:51:41Z)
ShapeWordNet: An Interpretable Shapelet Neural Network for Physiological Signal Classification [16.82411861562806]
We propose a more effective and interpretable scheme tailored for the physiological signal classification task. We exploit the time series shapelets to extract prominent local patterns and perform interpretable sequence discretization. We name our method as ShapeWordNet and conduct extensive experiments on three real-world datasets to investigate its effectiveness.
arXiv Detail & Related papers (2023-02-10T02:30:31Z)
Unsupervised Machine Learning for Exploratory Data Analysis of Exoplanet Transmission Spectra [68.8204255655161]
We focus on unsupervised techniques for analyzing spectral data from transiting exoplanets. We show that there is a high degree of correlation in the spectral data, which calls for appropriate low-dimensional representations. We uncover interesting structures in the principal component basis, namely, well-defined branches corresponding to different chemical regimes.
arXiv Detail & Related papers (2022-01-07T22:26:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.