DNA mixture deconvolution using an evolutionary algorithm with multiple
populations, hill-climbing, and guided mutation
- URL: http://arxiv.org/abs/2012.00513v1
- Date: Tue, 1 Dec 2020 14:23:55 GMT
- Title: DNA mixture deconvolution using an evolutionary algorithm with multiple
populations, hill-climbing, and guided mutation
- Authors: S{\o}ren B. Vilsen, Torben Tvedebrink, and Poul Svante Eriksen
- Abstract summary: DNA samples crime cases analysed in forensic genetics frequently contain DNA from multiple contributors.
In cases where one or more of the contributors were unknown, an objective of interest would be the separation, often called deconvolution, of these unknown profiles.
We introduced a multiple population evolutionary algorithm (MEA) to obtain deconvolutions of the unknown DNA profiles.
- Score: 0.8029049649310211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: DNA samples crime cases analysed in forensic genetics, frequently contain DNA
from multiple contributors. These occur as convolutions of the DNA profiles of
the individual contributors to the DNA sample. Thus, in cases where one or more
of the contributors were unknown, an objective of interest would be the
separation, often called deconvolution, of these unknown profiles. In order to
obtain deconvolutions of the unknown DNA profiles, we introduced a multiple
population evolutionary algorithm (MEA). We allowed the mutation operator of
the MEA to utilise that the fitness is based on a probabilistic model and guide
it by using the deviations between the observed and the expected value for
every element of the encoded individual. This guided mutation operator (GM) was
designed such that the larger the deviation the higher probability of mutation.
Furthermore, the GM was inhomogeneous in time, decreasing to a specified lower
bound as the number of iterations increased. We analysed 102 two-person DNA
mixture samples in varying mixture proportions. The samples were quantified
using two different DNA prep. kits: (1) Illumina ForenSeq Panel B (30 samples),
and (2) Applied Biosystems Precision ID Globalfiler NGS STR panel (72 samples).
The DNA mixtures were deconvoluted by the MEA and compared to the true DNA
profiles of the sample. We analysed three scenarios where we assumed: (1) the
DNA profile of the major contributor was unknown, (2) DNA profile of the minor
was unknown, and (3) both DNA profiles were unknown. Furthermore, we conducted
a series of sensitivity experiments on the ForenSeq panel by varying the
sub-population size, comparing a completely random homogeneous mutation
operator to the guided operator with varying mutation decay rates, and allowing
for hill-climbing of the parent population.
Related papers
- HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model [70.69095062674944]
We propose HybriDNA, a decoder-only DNA language model that incorporates a hybrid Transformer-Mamba2 architecture.
This hybrid design enables HybriDNA to efficiently process DNA sequences up to 131kb in length with single-nucleotide resolution.
HybriDNA achieves state-of-the-art performance across 33 DNA understanding datasets curated from the BEND, GUE, and LRB benchmarks.
arXiv Detail & Related papers (2025-02-15T14:23:43Z) - Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed.
We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels.
At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z) - Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA [44.630039477717624]
MxDNA is a novel framework where the model autonomously learns an effective DNA tokenization strategy through gradient decent.
We show that MxDNA learns unique tokenization strategy distinct to those of previous methods and captures genomic functionalities at a token level during self-supervised pretraining.
arXiv Detail & Related papers (2024-12-18T10:55:43Z) - deepNoC: A deep learning system to assign the number of contributors to a short tandem repeat DNA profile [0.0]
We develop an analysis pipeline that simulates the electrophoretic signal of an STR profile, allowing virtually unlimited, pre-labelled training material to be generated.
We show that by simulating 100 000 profiles and training a number of contributors estimation tool using a deep neural network architecture (in an algorithm named deepNoC) that a high level of performance is achieved 89% for 1 to 10 contributors.
arXiv Detail & Related papers (2024-12-13T02:42:56Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - DNABERT-S: Pioneering Species Differentiation with Species-Aware DNA Embeddings [7.822348354050447]
We introduce DNABERT-S, a tailored genome model that develops species-aware embeddings to naturally cluster and segregate DNA sequences of different species.
Emerged results on 23 diverse datasets show DNABERT-S's effectiveness, especially in realistic label-scarce scenarios.
arXiv Detail & Related papers (2024-02-13T20:21:29Z) - BEND: Benchmarking DNA Language Models on biologically meaningful tasks [7.005668635562045]
We introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks.
We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features.
arXiv Detail & Related papers (2023-11-21T12:34:00Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - rfPhen2Gen: A machine learning based association study of brain imaging
phenotypes to genotypes [71.1144397510333]
We learned machine learning models to predict SNPs using 56 brain imaging QTs.
SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest.
Random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders.
arXiv Detail & Related papers (2022-03-31T20:15:22Z) - Private DNA Sequencing: Hiding Information in Discrete Noise [6.647959476396793]
We study the problem of hiding a binary random variable $X$ with the additive noise provided by mixing DNA samples.
We characterize upper and lower bounds to the solution of this problem, which are empirically shown to be very close.
arXiv Detail & Related papers (2021-01-28T17:13:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.