Surrogate- and invariance-boosted contrastive learning for data-scarce
applications in science
- URL: http://arxiv.org/abs/2110.08406v1
- Date: Fri, 15 Oct 2021 23:08:24 GMT
- Title: Surrogate- and invariance-boosted contrastive learning for data-scarce
applications in science
- Authors: Charlotte Loh, Thomas Christensen, Rumen Dangovski, Samuel Kim and
Marin Soljacic
- Abstract summary: We introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive'' and easily obtainable auxiliary information sources to overcome data scarcity.
We demonstrate SIB-CL's effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photonic crystals and solving the 3D time-independent Schrodinger equation.
- Score: 2.959890389883449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning techniques have been increasingly applied to the natural
sciences, e.g., for property prediction and optimization or material discovery.
A fundamental ingredient of such approaches is the vast quantity of labelled
data needed to train the model; this poses severe challenges in data-scarce
settings where obtaining labels requires substantial computational or labor
resources. Here, we introduce surrogate- and invariance-boosted contrastive
learning (SIB-CL), a deep learning framework which incorporates three
``inexpensive'' and easily obtainable auxiliary information sources to overcome
data scarcity. Specifically, these are: 1)~abundant unlabeled data, 2)~prior
knowledge of symmetries or invariances and 3)~surrogate data obtained at
near-zero cost. We demonstrate SIB-CL's effectiveness and generality on various
scientific problems, e.g., predicting the density-of-states of 2D photonic
crystals and solving the 3D time-independent Schrodinger equation. SIB-CL
consistently results in orders of magnitude reduction in the number of labels
needed to achieve the same network accuracies.
Related papers
- S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search [30.071862398889774]
We propose S-MolSearch, the first framework to leverage molecular 3D information and affinity information in contrastive learning for virtual screening.
S-MolSearch efficiently processes both labeled and unlabeled data, training molecular structural encoders while generating soft labels for the unlabeled data.
It surpasses both structure-based and ligand-based virtual screening methods for enrichment factors across 0.5%, 1% and 5%.
arXiv Detail & Related papers (2024-08-27T14:51:11Z) - Physics-Informed Deep Learning and Partial Transfer Learning for Bearing Fault Diagnosis in the Presence of Highly Missing Data [0.0]
This paper presents the PTPAI method, which uses a physics-informed deep learning-based technique to generate synthetic labeled data.
It addresses imbalanced class problems and partial-set fault diagnosis hurdles.
Experimental outcomes on the CWRU and JNU datasets indicate that the proposed approach effectively addresses these problems.
arXiv Detail & Related papers (2024-06-16T17:36:53Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - A New Benchmark: On the Utility of Synthetic Data with Blender for Bare
Supervised Learning and Downstream Domain Adaptation [42.2398858786125]
Deep learning in computer vision has achieved great success with the price of large-scale labeled training data.
The uncontrollable data collection process produces non-IID training and test data, where undesired duplication may exist.
To circumvent them, an alternative is to generate synthetic data via 3D rendering with domain randomization.
arXiv Detail & Related papers (2023-03-16T09:03:52Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Open-Set Semi-Supervised Learning for 3D Point Cloud Understanding [62.17020485045456]
It is commonly assumed in semi-supervised learning (SSL) that the unlabeled data are drawn from the same distribution as that of the labeled ones.
We propose to selectively utilize unlabeled data through sample weighting, so that only conducive unlabeled data would be prioritized.
arXiv Detail & Related papers (2022-05-02T16:09:17Z) - An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in
Structural Engineering [2.4493299476776773]
t-Distributed Neighbor Embedding (t-SNE) algorithm is used to reduce the dimensions of an earthquake related data set for visualization purposes.
Synthetic Minority Oversampling Technique (SMOTE) is used to tackle the imbalanced nature of such data set.
We show that using t-SNE on the imbalanced data and SMOTE on the training data set, neural network classifiers have promising results without sacrificing accuracy.
arXiv Detail & Related papers (2021-09-18T01:24:39Z) - SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural
Networks [81.64530401885476]
We propose a self-supervised LiDAR odometry method, dubbed SelfVoxeLO, to tackle these two difficulties.
Specifically, we propose a 3D convolution network to process the raw LiDAR data directly, which extracts features that better encode the 3D geometric patterns.
We evaluate our method's performances on two large-scale datasets, i.e., KITTI and Apollo-SouthBay.
arXiv Detail & Related papers (2020-10-19T09:23:39Z) - Uncovering the structure of clinical EEG signals with self-supervised
learning [64.4754948595556]
Supervised learning paradigms are often limited by the amount of labeled data that is available.
This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG)
By extracting information from unlabeled data, it might be possible to reach competitive performance with deep neural networks.
arXiv Detail & Related papers (2020-07-31T14:34:47Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.