TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning
- URL: http://arxiv.org/abs/2403.11353v4
- Date: Mon, 16 Dec 2024 00:31:21 GMT
- Title: TransPeakNet: Solvent-Aware 2D NMR Prediction via Multi-Task Pre-Training and Unsupervised Learning
- Authors: Yunrui Li, Hao Xu, Ambrish Kumar, Duosheng Wang, Christian Heiss, Parastoo Azadi, Pengyu Hong,
- Abstract summary: We introduce an unsupervised training framework for predicting cross-peaks in 2D NMR.
Our approach pretrains an ML model on an annotated 1D dataset of 1H and 13C shifts, then finetunes it in an unsupervised manner.
Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods.
- Score: 5.7279868722119325
- License:
- Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is essential for revealing molecular structure, electronic environment, and dynamics. Accurate NMR shift prediction allows researchers to validate structures by comparing predicted and observed shifts. While Machine Learning (ML) has improved one-dimensional (1D) NMR shift prediction, predicting 2D NMR remains challenging due to limited annotated data. To address this, we introduce an unsupervised training framework for predicting cross-peaks in 2D NMR, specifically Heteronuclear Single Quantum Coherence (HSQC).Our approach pretrains an ML model on an annotated 1D dataset of 1H and 13C shifts, then finetunes it in an unsupervised manner using unlabeled HSQC data, which simultaneously generates cross-peak annotations. Our model also adjusts for solvent effects. Evaluation on 479 expert-annotated HSQC spectra demonstrates our model's superiority over traditional methods (ChemDraw and Mestrenova), achieving Mean Absolute Errors (MAEs) of 2.05 ppm and 0.165 ppm for 13C shifts and 1H shifts respectively. Our algorithmic annotations show a 95.21% concordance with experts' assignments, underscoring the approach's potential for structural elucidation in fields like organic chemistry, pharmaceuticals, and natural products.
Related papers
- Equivariant Masked Position Prediction for Efficient Molecular Representation [6.761418610103767]
Graph neural networks (GNNs) have shown considerable promise in computational chemistry.
We introduce a novel self-supervised approach termed Equivariant Masked Position Prediction (EMPP)
EMPP formulates a nuanced position prediction task that is more well-defined and enhances the learning of quantum mechanical features.
arXiv Detail & Related papers (2025-02-12T08:39:26Z) - Graph-neural-network predictions of solid-state NMR parameters from spherical tensor decomposition [0.0]
Nuclear magnetic resonance (NMR) is a powerful spectroscopic technique that is sensitive to the local atomic structure of matter.
Machine learning (ML) has emerged as an efficient route to making such predictions.
arXiv Detail & Related papers (2024-12-19T17:11:07Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Denoise Pretraining on Nonequilibrium Molecules for Accurate and
Transferable Neural Potentials [8.048439531116367]
We propose denoise pretraining on nonequilibrium molecular conformations to achieve more accurate and transferable GNN potential predictions.
Our models pretrained on small molecules demonstrate remarkable transferability, improving performance when fine-tuned on diverse molecular systems.
arXiv Detail & Related papers (2023-03-03T21:15:22Z) - Generative structured normalizing flow Gaussian processes applied to
spectroscopic data [4.0773490083614075]
In the physical sciences, limited training data may not adequately characterize future observed data.
It is critical that models adequately indicate uncertainty, particularly when they may be asked to extrapolate.
We demonstrate the methodology on laser-induced breakdown spectroscopy data from the ChemCam instrument onboard the Mars rover Curiosity.
arXiv Detail & Related papers (2022-12-14T23:57:46Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - Semi-Supervised Junction Tree Variational Autoencoder for Molecular
Property Prediction [0.0]
This research modifies state-of-the-art molecule generation method - Junction Tree Variational Autoencoder (JT-VAE) to facilitate semi-supervised learning on chemical property prediction.
We leverage JT-VAE architecture to learn an interpretable representation optimal for tasks ranging from molecule property prediction to conditional molecule generation.
arXiv Detail & Related papers (2022-08-10T03:06:58Z) - Graph Neural Networks for Temperature-Dependent Activity Coefficient
Prediction of Solutes in Ionic Liquids [58.720142291102135]
We present a GNN to predict temperature-dependent infinite dilution ACs of solutes in ILs.
We train the GNN on a database including more than 40,000 AC values and compare it to a state-of-the-art MCM.
The GNN and MCM achieve similar high prediction performance, with the GNN additionally enabling high-quality predictions for ACs of solutions that contain ILs and solutes not considered during training.
arXiv Detail & Related papers (2022-06-23T15:27:29Z) - Chemical-Reaction-Aware Molecule Representation Learning [88.79052749877334]
We propose using chemical reactions to assist learning molecule representation.
Our approach is proven effective to 1) keep the embedding space well-organized and 2) improve the generalization ability of molecule embeddings.
Experimental results demonstrate that our method achieves state-of-the-art performance in a variety of downstream tasks.
arXiv Detail & Related papers (2021-09-21T00:08:43Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Assessing Graph-based Deep Learning Models for Predicting Flash Point [52.931492216239995]
Graph-based deep learning (GBDL) models were implemented in predicting flash point for the first time.
Average R2 and Mean Absolute Error (MAE) scores of MPNN are, respectively, 2.3% lower and 2.0 K higher than previous comparable studies.
arXiv Detail & Related papers (2020-02-26T06:10:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.