A reproducibility study of "Augmenting Genetic Algorithms with Deep
Neural Networks for Exploring the Chemical Space"
- URL: http://arxiv.org/abs/2102.00700v2
- Date: Tue, 2 Feb 2021 22:24:18 GMT
- Title: A reproducibility study of "Augmenting Genetic Algorithms with Deep
Neural Networks for Exploring the Chemical Space"
- Authors: Kevin Maik Jablonka, Fergus Mcilwaine, Susana Garcia, Berend Smit,
Brian Yoo
- Abstract summary: Nigam et al. reported a genetic algorithm (GA) utilizing the SELFIES representation and also propose an adaptive, neural network-based penalty.
Overall, we were able to reproduce comparable results using the SELFIES-based GA.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nigam et al. reported a genetic algorithm (GA) utilizing the SELFIES
representation and also propose an adaptive, neural network-based penalty that
is supposed to improve the diversity of the generated molecules. The main
claims of the paper are that this GA outperforms other generative techniques
(as measured by the penalized logP) and that a neural network-based adaptive
penalty increases the diversity of the generated molecules. In this work, we
investigated the reproducibility of their claims. Overall, we were able to
reproduce comparable results using the SELFIES-based GA, but mostly by
exploiting deficiencies of the (easily optimizable) fitness function (i.e.,
generating long, sulfur containing chains). In addition, we also reproduce
results showing that the discriminator can be used to bias the generation of
molecules to ones that are similar to the reference set. Moreover, we also
attempted to quantify the evolution of the diversity, understand the influence
of some hyperparameters, and propose improvements to the adaptive penalty.
Related papers
- PDE: Gene Effect Inspired Parameter Dynamic Evolution for Low-light Image Enhancement [22.73061476533364]
Low-light image enhancement (LLIE) is a fundamental task in computational photography, aiming to improve illumination, reduce noise, and enhance image quality.<n>We observe a phenomenon: resetting certain parameters to random values unexpectedly improves enhancement performance for some images.<n>The gene effect limits enhancement performance, as even random parameters can sometimes outperform learned ones, preventing models from fully utilizing their capacity.<n>Inspired by biological evolution, where adaptation to new environments relies on gene mutation and recombination, we propose parameter dynamic evolution (PDE) to adapt to different images and mitigate the gene effect.
arXiv Detail & Related papers (2025-05-14T07:14:25Z) - GRAPE: Heterogeneous Graph Representation Learning for Genetic Perturbation with Coding and Non-Coding Biotype [51.58774936662233]
Building gene regulatory networks (GRN) is essential to understand and predict the effects of genetic perturbations.<n>In this work, we leverage pre-trained large language model and DNA sequence model to extract features from gene descriptions and DNA sequence data.<n>We introduce gene biotype information for the first time in genetic perturbation, simulating the distinct roles of genes with different biotypes in regulating cellular processes.
arXiv Detail & Related papers (2025-05-06T03:35:24Z) - Learning to Discover Regulatory Elements for Gene Expression Prediction [59.470991831978516]
Seq2Exp is a Sequence to Expression network designed to discover and extract regulatory elements that drive target gene expression.
Our approach captures the causal relationship between epigenomic signals, DNA sequences and their associated regulatory elements.
arXiv Detail & Related papers (2025-02-19T03:25:49Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - GPO-VAE: Modeling Explainable Gene Perturbation Responses utilizing GRN-Aligned Parameter Optimization [15.892401495784078]
We propose GPO-VAE, an explainable variational autoencoders (VAEs) enhanced by GRN-aligned gene regulatory networks.
Our key approach is to optimize the learnable parameters related to latent perturbation effects towards GRN-aligned explainability.
arXiv Detail & Related papers (2025-01-31T09:08:52Z) - Cross-Attention Graph Neural Networks for Inferring Gene Regulatory Networks with Skewed Degree Distribution [9.919024883502322]
Cross-Attention Complex Dual Graph Embedding Model (XATGRN)
Our model consistently outperforms existing state-of-the-art methods across various datasets.
arXiv Detail & Related papers (2024-12-18T10:56:40Z) - CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy.
We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes [1.7226572355808027]
We propose Gene Regulatory Genetic Algorithm (GRGA), which is the first time to utilize relationships among genes for improving GA's accuracy and efficiency.
We use a directed multipartite graph encapsulating the solution space, called RGGR, where each node corresponds to a gene in the solution and the edge represents the relationship between adjacent nodes.
The obtained RGGR is then employed to determine appropriate loci of crossover and mutation operators, thereby directing the evolutionary process toward faster and better convergence.
arXiv Detail & Related papers (2024-04-28T08:33:39Z) - Predicting loss-of-function impact of genetic mutations: a machine
learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores.
These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation.
Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Relation Embedding based Graph Neural Networks for Handling
Heterogeneous Graph [58.99478502486377]
We propose a simple yet efficient framework to make the homogeneous GNNs have adequate ability to handle heterogeneous graphs.
Specifically, we propose Relation Embedding based Graph Neural Networks (RE-GNNs), which employ only one parameter per relation to embed the importance of edge type relations and self-loop connections.
arXiv Detail & Related papers (2022-09-23T05:24:18Z) - Deep neural networks with controlled variable selection for the
identification of putative causal genetic variants [0.43012765978447565]
We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies.
The merit of the proposed method includes: (1) flexible modelling of the non-linear effect of genetic variants to improve statistical power; (2) multiple knockoffs in the input layer to rigorously control false discovery rate; (3) hierarchical layers to substantially reduce the number of weight parameters and activations to improve computational efficiency.
arXiv Detail & Related papers (2021-09-29T20:57:48Z) - IE-GAN: An Improved Evolutionary Generative Adversarial Network Using a
New Fitness Function and a Generic Crossover Operator [20.100388977505002]
We propose an improved E-GAN framework called IE-GAN, which introduces a new fitness function and a generic crossover operator.
In particular, the proposed fitness function can model the evolutionary process of individuals more accurately.
The crossover operator, which has been commonly adopted in evolutionary algorithms, can enable offspring to imitate the superior gene expression of their parents.
arXiv Detail & Related papers (2021-07-25T13:55:07Z) - Complexity-based speciation and genotype representation for
neuroevolution [81.21462458089142]
This paper introduces a speciation principle for neuroevolution where evolving networks are grouped into species based on the number of hidden neurons.
The proposed speciation principle is employed in several techniques designed to promote and preserve diversity within species and in the ecosystem as a whole.
arXiv Detail & Related papers (2020-10-11T06:26:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.