Hyperdimensional computing: a fast, robust and interpretable paradigm
for biological data
- URL: http://arxiv.org/abs/2402.17572v1
- Date: Tue, 27 Feb 2024 15:09:20 GMT
- Title: Hyperdimensional computing: a fast, robust and interpretable paradigm
for biological data
- Authors: Michiel Stock, Dimitri Boeckaerts, Pieter Dewulf, Steff Taelman,
Maxime Van Haeverbeke, Wim Van Criekinge, Bernard De Baets
- Abstract summary: New algorithms for processing diverse biological data sources have revolutionized bioinformatics.
Deep learning has substantially transformed bioinformatics, addressing sequence, structure, and functional analyses.
Hyperdimensional computing has emerged as an intriguing alternative.
- Score: 9.094234519404907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in bioinformatics are primarily due to new algorithms for processing
diverse biological data sources. While sophisticated alignment algorithms have
been pivotal in analyzing biological sequences, deep learning has substantially
transformed bioinformatics, addressing sequence, structure, and functional
analyses. However, these methods are incredibly data-hungry, compute-intensive
and hard to interpret. Hyperdimensional computing (HDC) has recently emerged as
an intriguing alternative. The key idea is that random vectors of high
dimensionality can represent concepts such as sequence identity or phylogeny.
These vectors can then be combined using simple operators for learning,
reasoning or querying by exploiting the peculiar properties of high-dimensional
spaces. Our work reviews and explores the potential of HDC for bioinformatics,
emphasizing its efficiency, interpretability, and adeptness in handling
multimodal and structured data. HDC holds a lot of potential for various omics
data searching, biosignal analysis and health applications.
Related papers
- GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems [6.668992155393883]
We propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG)
Our system, BioAgents, enables local operation and personalization using proprietary data.
We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities.
arXiv Detail & Related papers (2025-01-10T19:30:59Z) - Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset.
This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks.
We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - GeoTop: Advancing Image Classification with Geometric-Topological
Analysis [0.0]
Topological Data Analysis and Lipschitz-Killing Curvatures are used as powerful tools for feature extraction and classification.
We investigate the potential of combining both methods to improve classification accuracy.
This approach has the potential to advance our understanding of complex biological processes in various biomedical applications.
arXiv Detail & Related papers (2023-11-08T23:38:32Z) - Criticality Analysis: Bio-inspired Nonlinear Data Representation [0.0]
Criticality Analysis (CA) is a bio-inspired method of information representation within a controlled self-organised critical system.
The input can be reduced dimensionally to a projection output that retains the features of the overall data, yet has much simpler dynamic response.
The CA method allows for a biologically relevant encoding mechanism of arbitrary input to biosystems, creating a suitable model for information processing in varying complexity of organisms.
arXiv Detail & Related papers (2023-05-11T19:02:09Z) - Classical-to-Quantum Sequence Encoding in Genomics [0.0]
We present several novel methods of performing classical-to-quantum data encoding inspired by various mathematical fields.
We introduce algorithms that draw inspiration from diverse fields such as Electrical and Electronic Engineering, Information Theory, Differential Geometry, and Neural Network architectures.
We propose a contemporary method for testing encoded DNA sequences using Quantum Boltzmann Machines.
arXiv Detail & Related papers (2023-04-21T07:35:49Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering
Algorithm in Julia [59.422301529692454]
We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia.
We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
arXiv Detail & Related papers (2021-05-03T22:30:38Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.