DNA mixture deconvolution using an evolutionary algorithm with multiple
populations, hill-climbing, and guided mutation
- URL: http://arxiv.org/abs/2012.00513v1
- Date: Tue, 1 Dec 2020 14:23:55 GMT
- Title: DNA mixture deconvolution using an evolutionary algorithm with multiple
populations, hill-climbing, and guided mutation
- Authors: S{\o}ren B. Vilsen, Torben Tvedebrink, and Poul Svante Eriksen
- Abstract summary: DNA samples crime cases analysed in forensic genetics frequently contain DNA from multiple contributors.
In cases where one or more of the contributors were unknown, an objective of interest would be the separation, often called deconvolution, of these unknown profiles.
We introduced a multiple population evolutionary algorithm (MEA) to obtain deconvolutions of the unknown DNA profiles.
- Score: 0.8029049649310211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DNA samples crime cases analysed in forensic genetics, frequently contain DNA
from multiple contributors. These occur as convolutions of the DNA profiles of
the individual contributors to the DNA sample. Thus, in cases where one or more
of the contributors were unknown, an objective of interest would be the
separation, often called deconvolution, of these unknown profiles. In order to
obtain deconvolutions of the unknown DNA profiles, we introduced a multiple
population evolutionary algorithm (MEA). We allowed the mutation operator of
the MEA to utilise that the fitness is based on a probabilistic model and guide
it by using the deviations between the observed and the expected value for
every element of the encoded individual. This guided mutation operator (GM) was
designed such that the larger the deviation the higher probability of mutation.
Furthermore, the GM was inhomogeneous in time, decreasing to a specified lower
bound as the number of iterations increased. We analysed 102 two-person DNA
mixture samples in varying mixture proportions. The samples were quantified
using two different DNA prep. kits: (1) Illumina ForenSeq Panel B (30 samples),
and (2) Applied Biosystems Precision ID Globalfiler NGS STR panel (72 samples).
The DNA mixtures were deconvoluted by the MEA and compared to the true DNA
profiles of the sample. We analysed three scenarios where we assumed: (1) the
DNA profile of the major contributor was unknown, (2) DNA profile of the minor
was unknown, and (3) both DNA profiles were unknown. Furthermore, we conducted
a series of sensitivity experiments on the ForenSeq panel by varying the
sub-population size, comparing a completely random homogeneous mutation
operator to the guided operator with varying mutation decay rates, and allowing
for hill-climbing of the parent population.
Related papers
- Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery [6.733319363951907]
textbfDy-mer is an explainable and robust representation scheme based on sparse recovery.
It achieves state-of-the-art performance in DNA promoter classification, yielding a remarkable textbf13% increase in accuracy.
arXiv Detail & Related papers (2024-07-06T15:08:31Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings [7.822348354050447]
We introduce DNABERT-S, a tailored genome model that develops species-aware embeddings to naturally cluster and segregate DNA sequences of different species.
Emerged results on 23 diverse datasets show DNABERT-S's effectiveness, especially in realistic label-scarce scenarios.
arXiv Detail & Related papers (2024-02-13T20:21:29Z) - Predicting loss-of-function impact of genetic mutations: a machine
learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores.
These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation.
Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z) - BEND: Benchmarking DNA Language Models on biologically meaningful tasks [7.005668635562045]
We introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks.
We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features.
arXiv Detail & Related papers (2023-11-21T12:34:00Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - Graph Neural Networks for Microbial Genome Recovery [64.91162205624848]
We propose to use Graph Neural Networks (GNNs) to leverage the assembly graph when learning contig representations for metagenomic binning.
Our method, VaeG-Bin, combines variational autoencoders for learning latent representations of the individual contigs, with GNNs for refining these representations by taking into account the neighborhood structure of the contigs in the assembly graph.
arXiv Detail & Related papers (2022-04-26T12:49:51Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Private DNA Sequencing: Hiding Information in Discrete Noise [6.647959476396793]
We study the problem of hiding a binary random variable $X$ with the additive noise provided by mixing DNA samples.
We characterize upper and lower bounds to the solution of this problem, which are empirically shown to be very close.
arXiv Detail & Related papers (2021-01-28T17:13:26Z) - A deep learning classifier for local ancestry inference [63.8376359764052]
Local ancestry inference identifies the ancestry of each segment of an individual's genome.
We develop a new LAI tool using a deep convolutional neural network with an encoder-decoder architecture.
We show that our model is able to learn admixture as a zero-shot task, yielding ancestry assignments that are nearly as accurate as those from the existing gold standard tool, RFMix.
arXiv Detail & Related papers (2020-11-04T00:42:01Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.