Hyperdimensional computing: a fast, robust and interpretable paradigm
for biological data
- URL: http://arxiv.org/abs/2402.17572v1
- Date: Tue, 27 Feb 2024 15:09:20 GMT
- Title: Hyperdimensional computing: a fast, robust and interpretable paradigm
for biological data
- Authors: Michiel Stock, Dimitri Boeckaerts, Pieter Dewulf, Steff Taelman,
Maxime Van Haeverbeke, Wim Van Criekinge, Bernard De Baets
- Abstract summary: New algorithms for processing diverse biological data sources have revolutionized bioinformatics.
Deep learning has substantially transformed bioinformatics, addressing sequence, structure, and functional analyses.
Hyperdimensional computing has emerged as an intriguing alternative.
- Score: 9.094234519404907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in bioinformatics are primarily due to new algorithms for processing
diverse biological data sources. While sophisticated alignment algorithms have
been pivotal in analyzing biological sequences, deep learning has substantially
transformed bioinformatics, addressing sequence, structure, and functional
analyses. However, these methods are incredibly data-hungry, compute-intensive
and hard to interpret. Hyperdimensional computing (HDC) has recently emerged as
an intriguing alternative. The key idea is that random vectors of high
dimensionality can represent concepts such as sequence identity or phylogeny.
These vectors can then be combined using simple operators for learning,
reasoning or querying by exploiting the peculiar properties of high-dimensional
spaces. Our work reviews and explores the potential of HDC for bioinformatics,
emphasizing its efficiency, interpretability, and adeptness in handling
multimodal and structured data. HDC holds a lot of potential for various omics
data searching, biosignal analysis and health applications.
Related papers
- Causal Representation Learning from Multimodal Biological Observations [57.00712157758845]
We aim to develop flexible identification conditions for multimodal data.
We establish identifiability guarantees for each latent component, extending the subspace identification results from prior work.
Our key theoretical ingredient is the structural sparsity of the causal connections among distinct modalities.
arXiv Detail & Related papers (2024-11-10T16:40:27Z) - Learning to refine domain knowledge for biological network inference [2.209921757303168]
Perturbation experiments allow biologists to discover causal relationships between variables of interest.
The sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms.
We propose an amortized algorithm for refining domain knowledge, based on data observations.
arXiv Detail & Related papers (2024-10-18T12:53:23Z) - Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - GeoTop: Advancing Image Classification with Geometric-Topological
Analysis [0.0]
Topological Data Analysis and Lipschitz-Killing Curvatures are used as powerful tools for feature extraction and classification.
We investigate the potential of combining both methods to improve classification accuracy.
This approach has the potential to advance our understanding of complex biological processes in various biomedical applications.
arXiv Detail & Related papers (2023-11-08T23:38:32Z) - Criticality Analysis: Bio-inspired Nonlinear Data Representation [0.0]
Criticality Analysis (CA) is a bio-inspired method of information representation within a controlled self-organised critical system.
The input can be reduced dimensionally to a projection output that retains the features of the overall data, yet has much simpler dynamic response.
The CA method allows for a biologically relevant encoding mechanism of arbitrary input to biosystems, creating a suitable model for information processing in varying complexity of organisms.
arXiv Detail & Related papers (2023-05-11T19:02:09Z) - Classical-to-Quantum Sequence Encoding in Genomics [0.0]
We present several novel methods of performing classical-to-quantum data encoding inspired by various mathematical fields.
We introduce algorithms that draw inspiration from diverse fields such as Electrical and Electronic Engineering, Information Theory, Differential Geometry, and Neural Network architectures.
We propose a contemporary method for testing encoded DNA sequences using Quantum Boltzmann Machines.
arXiv Detail & Related papers (2023-04-21T07:35:49Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering
Algorithm in Julia [59.422301529692454]
We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia.
We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
arXiv Detail & Related papers (2021-05-03T22:30:38Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.