Using an expert deviation carrying the knowledge of climate data in
usual clustering algorithms
- URL: http://arxiv.org/abs/2006.05603v1
- Date: Wed, 10 Jun 2020 01:42:40 GMT
- Title: Using an expert deviation carrying the knowledge of climate data in
usual clustering algorithms
- Authors: Emmanuel Biabiany, Vincent Page, Didier Bernard, H\'el\`ene
Paugam-Moisy
- Abstract summary: We identify an algorithm-temporal using clustering analysis on wind speed and cumulative rainfall datasets.
We show that using the L2 norm in conventional clustering methods can induce undesirable effects.
We propose to replace Euclidean distanceL (2) by a dissimilarity measure named Expert Cluster Deviation (ED)
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In order to help physicists to expand their knowledge of the climate in the
Lesser Antilles, we aim to identify the spatio-temporal configurations using
clustering analysis on wind speed and cumulative rainfall datasets. But we show
that using the L2 norm in conventional clustering methods as K-Means (KMS) and
Hierarchical Agglomerative Clustering (HAC) can induce undesirable effects. So,
we propose to replace Euclidean distance (L2) by a dissimilarity measure named
Expert Deviation (ED). Based on the symmetrized Kullback-Leibler divergence,
the ED integrates the properties of the observed physical parameters and
climate knowledge. This measure helps comparing histograms of four patches,
corresponding to geographical zones, that are influenced by atmospheric
structures. The combined evaluation of the internal homogeneity and the
separation of the clusters obtained using ED and L2 was performed. The results,
which are compared using the silhouette index, show five clusters with high
indexes. For the two available datasets one can see that, unlike KMS-L2, KMS-ED
discriminates the daily situations favorably, giving more physical meaning to
the clusters discovered by the algorithm. The effect of patches is observed in
the spatial analysis of representative elements for KMS-ED. The ED is able to
produce different configurations which makes the usual atmospheric structures
clearly identifiable. Atmospheric physicists can interpret the locations of the
impact of each cluster on a specific zone according to atmospheric structures.
KMS-L2 does not lead to such an interpretability, because the situations
represented are spatially quite smooth. This climatological study illustrates
the advantage of using ED as a new approach.
Related papers
- Improved Anomaly Detection through Conditional Latent Space VAE Ensembles [49.1574468325115]
Conditional Latent space Variational Autoencoder (CL-VAE) improved pre-processing for anomaly detection on data with known inlier classes and unknown outlier classes.
Model shows increased accuracy in anomaly detection, achieving an AUC of 97.4% on the MNIST dataset.
In addition, the CL-VAE shows increased benefits from ensembling, a more interpretable latent space, and an increased ability to learn patterns in complex data with limited model sizes.
arXiv Detail & Related papers (2024-10-16T07:48:53Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion [97.58125811599383]
Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge.
We propose a novel Multi-curvature shared and specific Embedding (IME) model for TKGC tasks.
IME incorporates two key properties, namely space-shared property and space-specific property.
arXiv Detail & Related papers (2024-03-28T23:31:25Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - AWT -- Clustering Meteorological Time Series Using an Aggregated Wavelet
Tree [9.470649284657483]
AWT is a clustering algorithm for time series data that also performs implicit outlier detection during the clustering.
We apply AWT to crowd sourced 2-m temperature data with an hourly resolution from the city of Vienna to detect outliers.
It is shown that both the outlier detection and the implicit mapping to land-use characteristic is possible with AWT.
arXiv Detail & Related papers (2022-12-13T15:25:29Z) - Weather2vec: Representation Learning for Causal Inference with Non-Local
Confounding in Air Pollution and Climate Studies [3.0616624345970975]
Estimating the causal effects of a spatially-varying intervention may be subject to non-local confounding (NLC)
This paper first formalizes NLC using the potential outcomes framework, providing a comparison with the related phenomenon of causal interference.
Then, it proposes a broadly applicable framework, termed "weather2vec", that uses the theory of balancing scores to learn representations of non-local information.
arXiv Detail & Related papers (2022-09-25T20:40:19Z) - Deep Learning Models of the Discrete Component of the Galactic
Interstellar Gamma-Ray Emission [61.26321023273399]
A significant point-like component from the small scale (or discrete) structure in the H2 interstellar gas might be present in the Fermi-LAT data.
We show that deep learning may be effectively employed to model the gamma-ray emission traced by these rare H2 proxies within statistical significance in data-rich regions.
arXiv Detail & Related papers (2022-06-06T18:00:07Z) - Perfect Spectral Clustering with Discrete Covariates [68.8204255655161]
We propose a spectral algorithm that achieves perfect clustering with high probability on a class of large, sparse networks.
Our method is the first to offer a guarantee of consistent latent structure recovery using spectral clustering.
arXiv Detail & Related papers (2022-05-17T01:41:06Z) - Predicting Solar Energetic Particles Using SDO/HMI Vector Magnetic Data
Products and a Bidirectional LSTM Network [6.759687230043489]
Solar energetic particles (SEPs) are an essential source of space radiation, which are hazards for humans in space, spacecraft, and technology in general.
We propose a deep learning method to predict if an active region (AR) would produce an SEP event given that (i) the AR will produce an M- or X-class flare and a coronal mass ejection associated with the flare, or (ii) the AR will produce an M- or X-class flare regardless of whether or not the flare is associated with a CME.
arXiv Detail & Related papers (2022-03-27T21:06:08Z) - Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal
Autoregressions with an Application to NO2 Satellite Data [0.0]
We consider sparse estimation of a class of high-dimensional models.
We estimate the relationships governing both the spatial and temporal dependence in a fully-driven way by penalizing a set of Yule-Walker equations.
A satellite simulation exercise shows strong finite sample performance compared to competing procedures.
arXiv Detail & Related papers (2021-08-05T21:51:45Z) - Coarse-Grain Cluster Analysis of Tensors with Application to Climate
Biome Identification [0.27998963147546146]
We use the discrete wavelet transform to analyze the effects of coarse-graining on clustering tensor data.
We are particularly interested in understanding how scale effects clustering of the Earth's climate system.
arXiv Detail & Related papers (2020-01-22T00:28:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.