Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms
- URL: http://arxiv.org/abs/2407.00236v1
- Date: Fri, 28 Jun 2024 21:13:57 GMT
- Title: Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms
- Authors: Samuel Stanton, Robert Alberstein, Nathan Frey, Andrew Watkins, Kyunghyun Cho,
- Abstract summary: We propose a new class of closed-form test functions for biophysical sequence optimization, which we call Ehrlich functions.
We provide empirical results demonstrating these functions are interesting objects of study and can be non-trivial to solve with a standard genetic optimization baseline.
- Score: 35.95872277958525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is a growing body of work seeking to replicate the success of machine learning (ML) on domains like computer vision (CV) and natural language processing (NLP) to applications involving biophysical data. One of the key ingredients of prior successes in CV and NLP was the broad acceptance of difficult benchmarks that distilled key subproblems into approachable tasks that any junior researcher could investigate, but good benchmarks for biophysical domains are rare. This scarcity is partially due to a narrow focus on benchmarks which simulate biophysical data; we propose instead to carefully abstract biophysical problems into simpler ones with key geometric similarities. In particular we propose a new class of closed-form test functions for biophysical sequence optimization, which we call Ehrlich functions. We provide empirical results demonstrating these functions are interesting objects of study and can be non-trivial to solve with a standard genetic optimization baseline.
Related papers
- NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering [0.14999444543328289]
We introduce a novel approach that integrates an optimized topic modelling framework, OVB-LDA, with the BI-POP CMA-ES optimization technique for enhanced scholarly document abstract categorization.
We employ the distilled MiniLM model, fine-tuned on domain-specific data, for high-precision answer extraction.
arXiv Detail & Related papers (2024-10-29T14:45:12Z) - Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models [4.762323642506732]
We seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery.
We introduce a new system, $BMLP_active$, which efficiently explores the genomic hypothesis space by guiding informative experimentation.
$BMLP_active$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation.
arXiv Detail & Related papers (2024-05-10T09:51:06Z) - Hyperdimensional computing: a fast, robust and interpretable paradigm
for biological data [9.094234519404907]
New algorithms for processing diverse biological data sources have revolutionized bioinformatics.
Deep learning has substantially transformed bioinformatics, addressing sequence, structure, and functional analyses.
Hyperdimensional computing has emerged as an intriguing alternative.
arXiv Detail & Related papers (2024-02-27T15:09:20Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Deep Learning Methods for Protein Family Classification on PDB
Sequencing Data [0.0]
We demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data.
Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.
arXiv Detail & Related papers (2022-07-14T06:11:32Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.