Ranking labs-of-origin for genetically engineered DNA using Metric
Learning
- URL: http://arxiv.org/abs/2107.07878v1
- Date: Fri, 16 Jul 2021 13:06:47 GMT
- Title: Ranking labs-of-origin for genetically engineered DNA using Metric
Learning
- Authors: I. Muniz, F. H. F. Camargo and A. Marques
- Abstract summary: We show our proposed method to rank the most likely labs-of-origin and generate embeddings for DNA sequences and labs.
These embeddings can also perform various other tasks, like clustering both DNA sequences and labs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the constant advancements of genetic engineering, a common concern is to
be able to identify the lab-of-origin of genetically engineered DNA sequences.
For that reason, AltLabs has hosted the genetic Engineering Attribution
Challenge to gather many teams to propose new tools to solve this problem. Here
we show our proposed method to rank the most likely labs-of-origin and generate
embeddings for DNA sequences and labs. These embeddings can also perform
various other tasks, like clustering both DNA sequences and labs and using them
as features for Machine Learning models applied to solve other problems. This
work demonstrates that our method outperforms the classic training method for
this task while generating other helpful information.
Related papers
- Nonparametric independence tests in high-dimensional settings, with applications to the genetics of complex disease [55.2480439325792]
We show how defining adequate premetric structures on the support spaces of the genetic data allows for novel approaches to such testing.
For each problem, we provide mathematical results, simulations and the application to real data.
arXiv Detail & Related papers (2024-07-29T01:00:53Z) - A Benchmark Dataset for Multimodal Prediction of Enzymatic Function Coupling DNA Sequences and Natural Language [3.384797724820242]
Predicting gene function from its DNA sequence is a fundamental challenge in biology.
Deep learning models have been proposed to embed DNA sequences and predict their enzymatic function.
Much of the scientific community's knowledge of biological function is not represented in categorical labels.
arXiv Detail & Related papers (2024-07-21T19:27:43Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - BEND: Benchmarking DNA Language Models on biologically meaningful tasks [7.005668635562045]
We introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks.
We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features.
arXiv Detail & Related papers (2023-11-21T12:34:00Z) - Learning to Untangle Genome Assembly with Graph Convolutional Networks [17.227634756670835]
We introduce a new learning framework to train a graph convolutional network to resolve assembly graphs by finding a correct path through them.
Experimental results show that a model, trained on simulated graphs generated solely from a single chromosome, is able to remarkably resolve all other chromosomes.
arXiv Detail & Related papers (2022-06-01T04:14:25Z) - GENEOnet: A new machine learning paradigm based on Group Equivariant
Non-Expansive Operators. An application to protein pocket detection [97.5153823429076]
We introduce a new computational paradigm based on Group Equivariant Non-Expansive Operators.
We test our method, called GENEOnet, on a key problem in drug design: detecting pockets on the surface of proteins that can host.
arXiv Detail & Related papers (2022-01-31T11:14:51Z) - Deep metric learning improves lab of origin prediction of genetically
engineered plasmids [63.05016513788047]
Genetic engineering attribution (GEA) is the ability to make sequence-lab associations.
We propose a method, based on metric learning, that ranks the most likely labs-of-origin.
We are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
arXiv Detail & Related papers (2021-11-24T16:29:03Z) - Efficient approximation of DNA hybridisation using deep learning [0.0]
We present the first comprehensive study of machine learning methods applied to the task of predicting DNA hybridisation.
We introduce a synthetic hybridisation dataset of over 2.5 million data points, enabling the use of a wide range of machine learning algorithms.
arXiv Detail & Related papers (2021-02-19T19:23:49Z) - A deep learning classifier for local ancestry inference [63.8376359764052]
Local ancestry inference identifies the ancestry of each segment of an individual's genome.
We develop a new LAI tool using a deep convolutional neural network with an encoder-decoder architecture.
We show that our model is able to learn admixture as a zero-shot task, yielding ancestry assignments that are nearly as accurate as those from the existing gold standard tool, RFMix.
arXiv Detail & Related papers (2020-11-04T00:42:01Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.