deepNoC: A deep learning system to assign the number of contributors to a short tandem repeat DNA profile
- URL: http://arxiv.org/abs/2412.09803v1
- Date: Fri, 13 Dec 2024 02:42:56 GMT
- Title: deepNoC: A deep learning system to assign the number of contributors to a short tandem repeat DNA profile
- Authors: Duncan Taylor, Melissa A. Humphries,
- Abstract summary: We develop an analysis pipeline that simulates the electrophoretic signal of an STR profile, allowing virtually unlimited, pre-labelled training material to be generated.
We show that by simulating 100 000 profiles and training a number of contributors estimation tool using a deep neural network architecture (in an algorithm named deepNoC) that a high level of performance is achieved 89% for 1 to 10 contributors.
- Score: 0.0
- License:
- Abstract: A common task in forensic biology is to interpret and evaluate short tandem repeat DNA profiles. The first step in these interpretations is to assign a number of contributors to the profiles, a task that is most often performed manually by a scientist using their knowledge of DNA profile behaviour. Studies using constructed DNA profiles have shown that as DNA profiles become more complex, and the number of DNA-donating individuals increases, the ability for scientists to assign the target number. There have been a number of machine learning algorithms developed that seek to assign the number of contributors to a DNA profile, however due to practical limitations in being able to generate DNA profiles in a laboratory, the algorithms have been based on summaries of the available information. In this work we develop an analysis pipeline that simulates the electrophoretic signal of an STR profile, allowing virtually unlimited, pre-labelled training material to be generated. We show that by simulating 100 000 profiles and training a number of contributors estimation tool using a deep neural network architecture (in an algorithm named deepNoC) that a high level of performance is achieved (89% for 1 to 10 contributors). The trained network can then have fine-tuning training performed with only a few hundred profiles in order to achieve the same accuracy within a specific laboratory. We also build into deepNoC secondary outputs that provide a level of explainability to a user of algorithm, and show how they can be displayed in an intuitive manner.
Related papers
- Simulating realistic short tandem repeat capillary electrophoretic signal using a generative adversarial network [0.0]
We develop a generative adversarial network, GAN, modified from the pix2pix GAN to achieve this task.
With 1078 DNA profiles we train the GAN and achieve the ability to simulate DNA profile information.
arXiv Detail & Related papers (2024-08-28T23:20:17Z) - Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery [6.733319363951907]
textbfDy-mer is an explainable and robust representation scheme based on sparse recovery.
It achieves state-of-the-art performance in DNA promoter classification, yielding a remarkable textbf13% increase in accuracy.
arXiv Detail & Related papers (2024-07-06T15:08:31Z) - BEND: Benchmarking DNA Language Models on biologically meaningful tasks [7.005668635562045]
We introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks.
We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features.
arXiv Detail & Related papers (2023-11-21T12:34:00Z) - DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence
Analysis Tasks [14.931476374660944]
DNAGPT is a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals.
By enhancing the classic GPT model with a binary classification task, a numerical regression task, and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks.
arXiv Detail & Related papers (2023-07-11T06:30:43Z) - Efficient Automation of Neural Network Design: A Survey on
Differentiable Neural Architecture Search [70.31239620427526]
Differentiable Neural Architecture Search (DNAS) rapidly imposed itself as the trending approach to automate the discovery of deep neural network architectures.
This rise is mainly due to the popularity of DARTS, one of the first major DNAS methods.
In this comprehensive survey, we focus specifically on DNAS and review recent approaches in this field.
arXiv Detail & Related papers (2023-04-11T13:15:29Z) - Natural language processing for clusterization of genes according to
their functions [62.997667081978825]
We propose an approach that reduces the analysis of several thousand genes to analysis of several clusters.
The descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches.
arXiv Detail & Related papers (2022-07-17T12:59:34Z) - Deep metric learning improves lab of origin prediction of genetically
engineered plasmids [63.05016513788047]
Genetic engineering attribution (GEA) is the ability to make sequence-lab associations.
We propose a method, based on metric learning, that ranks the most likely labs-of-origin.
We are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
arXiv Detail & Related papers (2021-11-24T16:29:03Z) - Exploring the Common Principal Subspace of Deep Features in Neural
Networks [50.37178960258464]
We find that different Deep Neural Networks (DNNs) trained with the same dataset share a common principal subspace in latent spaces.
Specifically, we design a new metric $mathcalP$-vector to represent the principal subspace of deep features learned in a DNN.
Small angles (with cosine close to $1.0$) have been found in the comparisons between any two DNNs trained with different algorithms/architectures.
arXiv Detail & Related papers (2021-10-06T15:48:32Z) - DNA mixture deconvolution using an evolutionary algorithm with multiple
populations, hill-climbing, and guided mutation [0.8029049649310211]
DNA samples crime cases analysed in forensic genetics frequently contain DNA from multiple contributors.
In cases where one or more of the contributors were unknown, an objective of interest would be the separation, often called deconvolution, of these unknown profiles.
We introduced a multiple population evolutionary algorithm (MEA) to obtain deconvolutions of the unknown DNA profiles.
arXiv Detail & Related papers (2020-12-01T14:23:55Z) - A deep learning classifier for local ancestry inference [63.8376359764052]
Local ancestry inference identifies the ancestry of each segment of an individual's genome.
We develop a new LAI tool using a deep convolutional neural network with an encoder-decoder architecture.
We show that our model is able to learn admixture as a zero-shot task, yielding ancestry assignments that are nearly as accurate as those from the existing gold standard tool, RFMix.
arXiv Detail & Related papers (2020-11-04T00:42:01Z) - Deep Representational Similarity Learning for analyzing neural
signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA)
DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.