Improving Self-supervised Molecular Representation Learning using
Persistent Homology
- URL: http://arxiv.org/abs/2311.17327v1
- Date: Wed, 29 Nov 2023 02:58:30 GMT
- Title: Improving Self-supervised Molecular Representation Learning using
Persistent Homology
- Authors: Yuankai Luo, Lei Shi, Veronika Thost
- Abstract summary: Self-supervised learning (SSL) has great potential for molecular representation learning.
In this paper, we study SSL based on persistent homology (PH), a mathematical tool for modeling topological features of data that persist across multiple scales.
- Score: 6.263470141349622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has great potential for molecular
representation learning given the complexity of molecular graphs, the large
amounts of unlabelled data available, the considerable cost of obtaining labels
experimentally, and the hence often only small training datasets. The
importance of the topic is reflected in the variety of paradigms and
architectures that have been investigated recently. Yet the differences in
performance seem often minor and are barely understood to date. In this paper,
we study SSL based on persistent homology (PH), a mathematical tool for
modeling topological features of data that persist across multiple scales. It
has several unique features which particularly suit SSL, naturally offering:
different views of the data, stability in terms of distance preservation, and
the opportunity to flexibly incorporate domain knowledge. We (1) investigate an
autoencoder, which shows the general representational power of PH, and (2)
propose a contrastive loss that complements existing approaches. We rigorously
evaluate our approach for molecular property prediction and demonstrate its
particular features in improving the embedding space: after SSL, the
representations are better and offer considerably more predictive power than
the baselines over different probing tasks; our loss increases baseline
performance, sometimes largely; and we often obtain substantial improvements
over very small datasets, a common scenario in practice.
Related papers
- An Information Criterion for Controlled Disentanglement of Multimodal Data [39.601584166020274]
Multimodal representation learning seeks to relate and decompose information inherent in multiple modalities.
Disentangled Self-Supervised Learning (DisentangledSSL) is a novel self-supervised approach for learning disentangled representations.
arXiv Detail & Related papers (2024-10-31T14:57:31Z) - Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis [0.9558392439655012]
Supervised Independent Subspace Principal Component Analysis ($texttsisPCA$) is a PCA extension designed for multi-subspace learning.
We demonstrate its ability to identify and separate hidden data structures through extensive applications, including breast cancer diagnosis.
Our results reveal distinct functional pathways associated with malaria colonization, underscoring the essentiality of explainable representation in high-dimensional data analysis.
arXiv Detail & Related papers (2024-10-31T03:09:40Z) - A Survey of the Self Supervised Learning Mechanisms for Vision Transformers [5.152455218955949]
The application of self supervised learning (SSL) in vision tasks has gained significant attention.
We develop a comprehensive taxonomy of systematically classifying the SSL techniques.
We discuss the motivations behind SSL, review popular pre-training tasks, and highlight the challenges and advancements in this field.
arXiv Detail & Related papers (2024-08-30T07:38:28Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - Explaining, Analyzing, and Probing Representations of Self-Supervised
Learning Models for Sensor-based Human Activity Recognition [2.2082422928825136]
Self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR)
In this paper, we aim to analyze deep representations of two recent SSL frameworks, namely SimCLR and VICReg.
arXiv Detail & Related papers (2023-04-14T07:53:59Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - The Geometry of Self-supervised Learning Models and its Impact on
Transfer Learning [62.601681746034956]
Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision.
We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each.
arXiv Detail & Related papers (2022-09-18T18:15:38Z) - Evaluating Self-Supervised Learning for Molecular Graph Embeddings [38.65102126919387]
Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling.
"MOLGRAPHEVAL" generates detailed profiles of molecular graph embeddings with interpretable and diversified attributes.
arXiv Detail & Related papers (2022-06-16T09:01:53Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Self-supervised Learning is More Robust to Dataset Imbalance [65.84339596595383]
We investigate self-supervised learning under dataset imbalance.
Off-the-shelf self-supervised representations are already more robust to class imbalance than supervised representations.
We devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets.
arXiv Detail & Related papers (2021-10-11T06:29:56Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.