De Novo Generation of Hit-like Molecules from Gene Expression Profiles via Deep Learning
- URL: http://arxiv.org/abs/2412.19422v2
- Date: Thu, 17 Apr 2025 08:28:21 GMT
- Title: De Novo Generation of Hit-like Molecules from Gene Expression Profiles via Deep Learning
- Authors: Chen Li, Yoshihiro Yamanishi,
- Abstract summary: We propose a hybrid neural network, HNN2Mol, to generate new molecules with potential bioactivities and drug-like properties.<n> Experimental results and case studies demonstrate that the proposed HNN2Mol model can produce new molecules with potential bioactivities and drug-like properties.
- Score: 3.9518122220368905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: De novo generation of hit-like molecules is a challenging task in the drug discovery process. Most methods in previous studies learn the semantics and syntax of molecular structures by analyzing molecular graphs or simplified molecular input line entry system (SMILES) strings; however, they do not take into account the drug responses of the biological systems consisting of genes and proteins. In this study we propose a hybrid neural network, HNN2Mol, which utilizes gene expression profiles to generate molecular structures with desirable phenotypes for arbitrary target proteins. In the algorithm, a variational autoencoder is employed as a feature extractor to learn the latent feature distribution of the gene expression profiles. Then, a long short-term memory is leveraged as the chemical generator to produce syntactically valid SMILES strings that satisfy the feature conditions of the gene expression profile extracted by the feature extractor. Experimental results and case studies demonstrate that the proposed HNN2Mol model can produce new molecules with potential bioactivities and drug-like properties.
Related papers
- Improved Molecular Generation through Attribute-Driven Integrative Embeddings and GAN Selectivity [0.0]
This paper introduces a transformer-based vector embedding generator combined with a modified Generative Adrialversa Network (GAN) to generate molecules with desired properties.
The embedding generator utilizes a novel molecular descriptor, integrating Morgan fingerprints with global molecular attributes.
The approach is validated by generating novel odorant molecules using a labeled dataset of odorant and non-odorant compounds.
arXiv Detail & Related papers (2025-04-26T22:15:25Z) - Machine Learning-Based Genomic Linguistic Analysis (Gene Sequence Feature Learning): A Case Study on Predicting Heavy Metal Response Genes in Rice [22.754584720614947]
We developed a hybrid model capable of extracting and learning meaningful features from gene sequences.
RNA-seq and qRT-PCR experiments conducted on rice leaves exposed to Hg0 revealed differential expression of genes associated with heavy metal responses.
Co-expression network analysis identified 103 related genes, and a literature review indicated that these genes are highly likely to be involved in heavy metal-related biological processes.
arXiv Detail & Related papers (2025-03-20T13:41:31Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - When Molecular GAN Meets Byte-Pair Encoding [2.5398391570038736]
This study introduces a molecular GAN that integrates a byte level byte-pair encoding tokenizer and employs reinforcement learning to enhance de novo molecular generation.
Specifically, the generator functions as an actor, producing SMILES strings, while the discriminator acts as a critic, evaluating their quality.
arXiv Detail & Related papers (2024-09-29T15:39:26Z) - ChemSpaceAL: An Efficient Active Learning Methodology Applied to
Protein-Specific Molecular Generation [0.0]
We present a computationally efficient active learning methodology that requires evaluation of only a subset of the generated data.
We demonstrate the applicability of this methodology to targeted molecular generation by fine-tuning a GPT-based molecular generator toward a protein with FDA-approved small-molecule inhibitors, c-Abl kinase.
Remarkably, the model learns to generate molecules similar to the inhibitors without prior knowledge of their existence, and even reproduces two of them exactly.
arXiv Detail & Related papers (2023-09-11T22:28:36Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Towards Predicting Equilibrium Distributions for Molecular Systems with
Deep Learning [60.02391969049972]
We introduce a novel deep learning framework, called Distributional Graphormer (DiG), in an attempt to predict the equilibrium distribution of molecular systems.
DiG employs deep neural networks to transform a simple distribution towards the equilibrium distribution, conditioned on a descriptor of a molecular system.
arXiv Detail & Related papers (2023-06-08T17:12:08Z) - Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation.
It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES.
Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features
Learning from a Language Model [3.0643865202019698]
We propose a new solution named SemanticCAP to identify accessible regions of the genome.
It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of gene sequences.
Compared with other systems under public benchmarks, our model proved to have better performance.
arXiv Detail & Related papers (2022-04-05T11:47:58Z) - De Novo Molecular Generation with Stacked Adversarial Model [24.83456726428956]
Conditional generative adversarial models have recently been proposed as promising approaches for de novo drug design.
We propose a new generative model which extends an existing adversarial autoencoder based model by stacking two models together.
Our stacked approach generates more valid molecules, as well as molecules that are more similar to known drugs.
arXiv Detail & Related papers (2021-10-24T14:23:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.