Related papers: Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records

Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records

URL: http://arxiv.org/abs/2511.22958v1
Date: Fri, 28 Nov 2025 08:03:46 GMT
Title: Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records
Authors: Shiyu Shen, Zhe Gao, Taifeng Chai, Yang Huang, Bin Pan,
Abstract summary: SolarCHIP is a family of contrastively pretrained visual backbones tailored to multi-instrument SDO observations.<n>SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals.
Score: 9.239205316203245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning has revolutionized solar image analysis, yet most approaches train task-specific encoders from scratch or rely on natural-image pretraining that ignores the unique characteristics of Solar Dynamics Observatory (SDO) data. We introduce SolarCHIP, a family of contrastively pretrained visual backbones tailored to multi-instrument SDO observations. SolarCHIP addresses three key challenges in solar imaging: multimodal sensing across AIA and HMI instruments, weak inter-class separability due to slow temporal evolution, and strong intra-class variability with sparse activity signals. Our pretraining framework employs a multi-granularity contrastive objective that jointly aligns (1) global class tokens across co-temporal AIA-HMI pairs to enhance temporal discrimination, (2) local patch tokens at fixed spatial indices to enforce position-consistent, modality-invariant features, and (3) intra-sample patches across different spatial locations to preserve fine-grained spatial structure. We train both CNN- and Vision Transformer-based autoencoders and demonstrate their effectiveness on two downstream tasks: cross-modal translation between HMI and AIA passbands via ControlNet, and full-disk flare classification. Experimental results show that SolarCHIP achieves state-of-the-art performance across both tasks, with particularly strong gains in low-resource settings where labeled data is limited. Ablation studies confirm that each contrastive component contributes essential discriminative capacity at different granularities. By publicly releasing pretrained weights and training code, we provide the heliophysics community with a practical, plug-and-play feature extractor that reduces computational requirements, improves label efficiency, and establishes a reusable foundation for diverse solar imaging applications.

Related papers

Nüwa: Mending the Spatial Integrity Torn by VLM Token Pruning [82.39668822222386]
Vision token pruning has proven to be an effective acceleration technique for the efficient Vision Language Model (VLM)<n>We propose $textNwa$, a two-stage token pruning framework that enables efficient feature aggregation while maintaining spatial integrity.<n>Experiments demonstrate that $textNwa$ achieves SOTA performance on multiple VQA benchmarks (from 94% to 95%) and yields substantial improvements on visual grounding tasks (from 7% to 47%)
arXiv Detail & Related papers (2026-02-03T00:51:03Z)
Latent Representation Learning in Heavy-Ion Collisions with MaskPoint Transformer [2.6610943214001765]
We introduce a Transformer-based autoencoder trained with a two-stage paradigm: self-supervised pre-training followed by supervised fine-tuning.<n>The encoder learns latent representations directly from unlabeled HIC data, providing a compact and information-rich feature space.<n>Results establish our two-stage framework as a general and robust foundation for feature learning in HIC, opening the door to more powerful analyses of quark--gluon plasma properties.
arXiv Detail & Related papers (2025-10-08T06:27:10Z)
From Bias to Balance: Exploring and Mitigating Spatial Bias in LVLMs [57.01486941224062]
Large Vision-Language Models (LVLMs) have achieved remarkable success across a wide range of multimodal tasks.<n>We focus on how models respond when identical key visual information is placed at different locations within an image.<n>We introduce Balanced Position Assignment (BaPA), a simple yet effective mechanism that assigns identical position embeddings to all image tokens.
arXiv Detail & Related papers (2025-09-26T07:07:03Z)
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric Optimization [17.622013322533423]
We introduce EVA02-AT, a suite of EVA02-based video-language foundation models tailored to egocentric video understanding tasks.<n> EVA02-AT efficiently transfers an image-based CLIP model into a unified video encoder via a single-stage pretraining.<n>We introduce the Symmetric Multi-Similarity (SMS) loss and a novel training framework that advances all soft labels for both positive and negative pairs.
arXiv Detail & Related papers (2025-06-17T09:51:51Z)
Spatial-Temporal-Spectral Unified Modeling for Remote Sensing Dense Prediction [20.1863553357121]
Current deep learning architectures for remote sensing are fundamentally rigid.<n>We introduce the Spatial-Temporal-Spectral Unified Network (STSUN) for unified modeling.<n> STSUN can adapt to input and output data with arbitrary spatial sizes, temporal lengths, and spectral bands.<n>It unifies various dense prediction tasks and diverse semantic class predictions.
arXiv Detail & Related papers (2025-05-18T07:39:17Z)
SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [62.433137130087445]
SuperFlow++ is a novel framework that integrates pretraining and downstream tasks using consecutive camera pairs.<n>We show that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions.<n>With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving.
arXiv Detail & Related papers (2025-03-25T17:59:57Z)
ST-ReP: Learning Predictive Representations Efficiently for Spatial-Temporal Forecasting [7.637123047745445]
Self-supervised methods are increasingly adapted to learn spatial-temporal representations.<n>Current value reconstruction and future value prediction are integrated into the pre-training framework.<n>Multi-time scale analysis is incorporated into the self-supervised loss to enhance predictive capability.
arXiv Detail & Related papers (2024-12-19T05:33:55Z)
EXCON: Extreme Instance-based Contrastive Representation Learning of Severely Imbalanced Multivariate Time Series for Solar Flare Prediction [1.024113475677323]
This work presents EXCON, a contrastive representation learning framework designed to enhance classification performance amidst such imbalances. Results include evaluations on the benchmark solar flare dataset and multiple time series archive datasets with binary and multiclass labels.
arXiv Detail & Related papers (2024-11-18T02:36:19Z)
Solar synthetic imaging: Introducing denoising diffusion probabilistic models on SDO/AIA data [0.0]
This study proposes using generative deep learning models, specifically a Denoising Diffusion Probabilistic Model (DDPM), to create synthetic images of solar phenomena. By employing a dataset from the AIA instrument aboard the SDO spacecraft, we aim to address the data scarcity issue. The DDPM's performance is evaluated using cluster metrics, Frechet Inception Distance (FID), and F1-score, showcasing promising results in generating realistic solar imagery.
arXiv Detail & Related papers (2024-04-03T08:18:45Z)
Self-Promoted Supervision for Few-Shot Transformer [178.52948452353834]
Self-promoted sUpervisioN (SUN) is a few-shot learning framework for vision transformers (ViTs) SUN pretrains the ViT on the few-shot learning dataset and then uses it to generate individual location-specific supervision for guiding each patch token. Experiments show that SUN using ViTs significantly surpasses other few-shot learning frameworks with ViTs and is the first one that achieves higher performance than those CNN state-of-the-arts.
arXiv Detail & Related papers (2022-03-14T12:53:27Z)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance. Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.