Related papers: Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

URL: http://arxiv.org/abs/2205.13927v1
Date: Fri, 27 May 2022 12:11:38 GMT
Title: Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design
Authors: J\"org K. H. Franke, Frederic Runge, Frank Hutter
Abstract summary: We propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer. We show the benefits of our approach on a synthetic task, with state-of-the-art results in RNA folding, and demonstrate its generative capabilities on property-based molecule design.
Score: 38.46798525594529
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Our world is ambiguous and this is reflected in the data we use to train our algorithms. This is especially true when we try to model natural processes where collected data is affected by noisy measurements and differences in measurement techniques. Sometimes, the process itself can be ambiguous, such as in the case of RNA folding, where a single nucleotide sequence can fold into multiple structures. This ambiguity suggests that a predictive model should have similar probabilistic characteristics to match the data it models. Therefore, we propose a hierarchical latent distribution to enhance one of the most successful deep learning models, the Transformer, to accommodate ambiguities and data distributions. We show the benefits of our approach on a synthetic task, with state-of-the-art results in RNA folding, and demonstrate its generative capabilities on property-based molecule design, outperforming existing work.

Related papers

Differentiable Folding for Nearest Neighbor Model Optimization [0.6291443816903801]
The Nearest Neighbor model is the $textitde facto$ thermodynamic model of RNA secondary structure formation. Here, we leverage recent advances in $textitdifferentiable folding$ to devise an efficient, scalable, and flexible means of parameter optimization. Our method yields a significantly improved parameter set that outperforms existing baselines on all metrics.
arXiv Detail & Related papers (2025-03-12T05:36:12Z)
Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design [56.957070405026194]
We propose an algorithm that enables direct backpropagation of rewards through entire trajectories generated by diffusion models. DRAKES can generate sequences that are both natural-like and yield high rewards.
arXiv Detail & Related papers (2024-10-17T15:10:13Z)
MING: A Functional Approach to Learning Molecular Generative Models [46.189683355768736]
This paper introduces a novel paradigm for learning molecule generative models based on functional representations. We propose Molecular Implicit Neural Generation (MING), a diffusion-based model that learns molecular distributions in function space.
arXiv Detail & Related papers (2024-10-16T13:02:02Z)
Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Splicing Up Your Predictions with RNA Contrastive Learning [4.35360799431127]
We extend contrastive learning techniques to genomic data by utilizing similarities between functional sequences generated through alternative splicing gene duplication. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations.
arXiv Detail & Related papers (2023-10-12T21:51:25Z)
Variational Autoencoding Molecular Graphs with Denoising Diffusion Probabilistic Model [0.0]
We propose a novel deep generative model that incorporates a hierarchical structure into the probabilistic latent vectors. We demonstrate that our model can design effective molecular latent vectors for molecular property prediction from some experiments by small datasets on physical properties and activity.
arXiv Detail & Related papers (2023-07-02T17:29:41Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Variational Mixture of Normalizing Flows [0.0]
Deep generative models, such as generative adversarial networks autociteGAN, variational autoencoders autocitevaepaper, and their variants, have seen wide adoption for the task of modelling complex data distributions. Normalizing flows have overcome this limitation by leveraging the change-of-suchs formula for probability density functions. The present work overcomes this by using normalizing flows as components in a mixture model and devising an end-to-end training procedure for such a model.
arXiv Detail & Related papers (2020-09-01T17:20:08Z)
Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset. With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset. In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.