Using an expert deviation carrying the knowledge of climate data in
usual clustering algorithms
- URL: http://arxiv.org/abs/2006.05603v1
- Date: Wed, 10 Jun 2020 01:42:40 GMT
- Title: Using an expert deviation carrying the knowledge of climate data in
usual clustering algorithms
- Authors: Emmanuel Biabiany, Vincent Page, Didier Bernard, H\'el\`ene
Paugam-Moisy
- Abstract summary: We identify an algorithm-temporal using clustering analysis on wind speed and cumulative rainfall datasets.
We show that using the L2 norm in conventional clustering methods can induce undesirable effects.
We propose to replace Euclidean distanceL (2) by a dissimilarity measure named Expert Cluster Deviation (ED)
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In order to help physicists to expand their knowledge of the climate in the
Lesser Antilles, we aim to identify the spatio-temporal configurations using
clustering analysis on wind speed and cumulative rainfall datasets. But we show
that using the L2 norm in conventional clustering methods as K-Means (KMS) and
Hierarchical Agglomerative Clustering (HAC) can induce undesirable effects. So,
we propose to replace Euclidean distance (L2) by a dissimilarity measure named
Expert Deviation (ED). Based on the symmetrized Kullback-Leibler divergence,
the ED integrates the properties of the observed physical parameters and
climate knowledge. This measure helps comparing histograms of four patches,
corresponding to geographical zones, that are influenced by atmospheric
structures. The combined evaluation of the internal homogeneity and the
separation of the clusters obtained using ED and L2 was performed. The results,
which are compared using the silhouette index, show five clusters with high
indexes. For the two available datasets one can see that, unlike KMS-L2, KMS-ED
discriminates the daily situations favorably, giving more physical meaning to
the clusters discovered by the algorithm. The effect of patches is observed in
the spatial analysis of representative elements for KMS-ED. The ED is able to
produce different configurations which makes the usual atmospheric structures
clearly identifiable. Atmospheric physicists can interpret the locations of the
impact of each cluster on a specific zone according to atmospheric structures.
KMS-L2 does not lead to such an interpretability, because the situations
represented are spatially quite smooth. This climatological study illustrates
the advantage of using ED as a new approach.
Related papers
- Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification [20.877039031702605]
We propose a three-dimensional hybrid-ensemble DA method that operates in an atmospheric latent space learned via an autoencoder (AE)<n>HLOBA maps both model forecasts and observations into a shared latent space via the AE encoder and an end-to-end Observation-to-Latent-space mapping network (O2Lnet)<n> Experiments show that this uncertainty highlights large-error regions and captures their seasonal variability.
arXiv Detail & Related papers (2026-03-04T18:58:27Z) - Geographically Weighted Canonical Correlation Analysis: Local Spatial Associations Between Two Sets of Variables [47.652697094546994]
This article critically assesses the utility of the classical statistical technique of Canonical Correlation Analysis (CCA) for studying spatial associations.<n>We propose Geographically Weighted Canonical Correlation Analysis (GWCCA) as a new technique for exploring local spatial associations between two sets of variables.<n>The results indicate that GWCCA has broad potential applications in spatial data-intensive fields such as urban planning, environmental science, public health, and transportation.
arXiv Detail & Related papers (2026-02-10T19:36:49Z) - Attention in Geometry: Scalable Spatial Modeling via Adaptive Density Fields and FAISS-Accelerated Kernels [0.0]
This work introduces Adaptive Density Fields (ADF), a geometric attention framework that formulates spatial aggregation as a query-conditioned, metric-induced attention operator in continuous space.<n>We demonstrate the framework through a case study on aircraft trajectory analysis in the Chengdu region, extracting trajectory-conditioned Zones of Influence (ZOI) to reveal recurrent airspace structures and localized deviations.
arXiv Detail & Related papers (2026-01-05T05:42:40Z) - Isolation-based Spherical Ensemble Representations for Anomaly Detection [60.989157958972356]
Anomaly detection is a critical task in data mining and management with applications spanning fraud detection, network security, and log monitoring.<n>Existing unsupervised anomaly detection methods face fundamental challenges including conflicting distributional assumptions, computational inefficiency, and difficulty handling different anomaly types.<n>We propose ISER (Isolation-based Spherical Ensemble Representations) that extends existing isolation-based methods by using hypersphere radii as proxies for local density characteristics while maintaining linear time and constant space complexity.
arXiv Detail & Related papers (2025-10-15T09:00:05Z) - Discovering Spatial Correlations of Earth Observations for weather forecasting by using Graph Structure Learning [4.794822439017277]
This study aims to improve the accuracy of weather predictions by discovering spatial correlations between Earth observations and atmospheric states.<n>We employ atemporal graph neural networks (STGNNs) with structure learning to solve this problem.<n>We validated the effectiveness of the proposed method using real-world atmospheric state and observation data from East Asia.
arXiv Detail & Related papers (2025-08-11T06:14:31Z) - Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z) - Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning [56.22841701016295]
Supervised Causal Learning (SCL) is an emerging paradigm in this field.
Existing Deep Neural Network (DNN)-based methods commonly adopt the "Node-Edge approach"
arXiv Detail & Related papers (2025-02-15T19:10:35Z) - Spatiotemporal Density Correction of Multivariate Global Climate Model Projections using Deep Learning [1.0801976288811024]
Global Climate Models (GCMs) are numerical models that simulate complex physical processes within the Earth's climate system.
GCMs suffer from systemic biases due to simplifications made to the underlying physical processes.
We propose a new semi-parametric conditional density estimation (SPCDE) for density correction of the joint distribution of daily precipitation and maximum temperature data.
arXiv Detail & Related papers (2024-11-27T22:55:48Z) - Improved Anomaly Detection through Conditional Latent Space VAE Ensembles [49.1574468325115]
Conditional Latent space Variational Autoencoder (CL-VAE) improved pre-processing for anomaly detection on data with known inlier classes and unknown outlier classes.
Model shows increased accuracy in anomaly detection, achieving an AUC of 97.4% on the MNIST dataset.
In addition, the CL-VAE shows increased benefits from ensembling, a more interpretable latent space, and an increased ability to learn patterns in complex data with limited model sizes.
arXiv Detail & Related papers (2024-10-16T07:48:53Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion [97.58125811599383]
Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge.
We propose a novel Multi-curvature shared and specific Embedding (IME) model for TKGC tasks.
IME incorporates two key properties, namely space-shared property and space-specific property.
arXiv Detail & Related papers (2024-03-28T23:31:25Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - AWT -- Clustering Meteorological Time Series Using an Aggregated Wavelet
Tree [9.470649284657483]
AWT is a clustering algorithm for time series data that also performs implicit outlier detection during the clustering.
We apply AWT to crowd sourced 2-m temperature data with an hourly resolution from the city of Vienna to detect outliers.
It is shown that both the outlier detection and the implicit mapping to land-use characteristic is possible with AWT.
arXiv Detail & Related papers (2022-12-13T15:25:29Z) - Weather2vec: Representation Learning for Causal Inference with Non-Local
Confounding in Air Pollution and Climate Studies [3.0616624345970975]
Estimating the causal effects of a spatially-varying intervention may be subject to non-local confounding (NLC)
This paper first formalizes NLC using the potential outcomes framework, providing a comparison with the related phenomenon of causal interference.
Then, it proposes a broadly applicable framework, termed "weather2vec", that uses the theory of balancing scores to learn representations of non-local information.
arXiv Detail & Related papers (2022-09-25T20:40:19Z) - Deep Learning Models of the Discrete Component of the Galactic
Interstellar Gamma-Ray Emission [61.26321023273399]
A significant point-like component from the small scale (or discrete) structure in the H2 interstellar gas might be present in the Fermi-LAT data.
We show that deep learning may be effectively employed to model the gamma-ray emission traced by these rare H2 proxies within statistical significance in data-rich regions.
arXiv Detail & Related papers (2022-06-06T18:00:07Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Predicting Solar Energetic Particles Using SDO/HMI Vector Magnetic Data
Products and a Bidirectional LSTM Network [6.759687230043489]
Solar energetic particles (SEPs) are an essential source of space radiation, which are hazards for humans in space, spacecraft, and technology in general.
We propose a deep learning method to predict if an active region (AR) would produce an SEP event given that (i) the AR will produce an M- or X-class flare and a coronal mass ejection associated with the flare, or (ii) the AR will produce an M- or X-class flare regardless of whether or not the flare is associated with a CME.
arXiv Detail & Related papers (2022-03-27T21:06:08Z) - Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal
Autoregressions with an Application to NO2 Satellite Data [0.0]
We consider sparse estimation of a class of high-dimensional models.
We estimate the relationships governing both the spatial and temporal dependence in a fully-driven way by penalizing a set of Yule-Walker equations.
A satellite simulation exercise shows strong finite sample performance compared to competing procedures.
arXiv Detail & Related papers (2021-08-05T21:51:45Z) - Coarse-Grain Cluster Analysis of Tensors with Application to Climate
Biome Identification [0.27998963147546146]
We use the discrete wavelet transform to analyze the effects of coarse-graining on clustering tensor data.
We are particularly interested in understanding how scale effects clustering of the Earth's climate system.
arXiv Detail & Related papers (2020-01-22T00:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.