Related papers: Neural Distance Embeddings for Biological Sequences

Neural Distance Embeddings for Biological Sequences

URL: http://arxiv.org/abs/2109.09740v1
Date: Mon, 20 Sep 2021 17:30:58 GMT
Title: Neural Distance Embeddings for Biological Sequences
Authors: Gabriele Corso, Rex Ying, Michal P\'andy, Petar Veli\v{c}kovi\'c, Jure Leskovec, Pietro Li\`o
Abstract summary: We present NeuroSEED, a framework to embed sequences in geometric vector spaces. We show the effectiveness of the hyperbolic space that captures the hierarchical structure and provides an average 22% reduction in embedding RMSE. The proposed approaches display significant accuracy and/or runtime improvements on real-world datasets.
Score: 43.07977514121458
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical for large-scale biological research. However, popular machine learning approaches, based on continuous Euclidean spaces, have struggled with the discrete combinatorial formulation of the edit distance that models evolution and the hierarchical relationship that characterises real-world datasets. We present Neural Distance Embeddings (NeuroSEED), a general framework to embed sequences in geometric vector spaces, and illustrate the effectiveness of the hyperbolic space that captures the hierarchical structure and provides an average 22% reduction in embedding RMSE against the best competing geometry. The capacity of the framework and the significance of these improvements are then demonstrated devising supervised and unsupervised NeuroSEED approaches to multiple core tasks in bioinformatics. Benchmarked with common baselines, the proposed approaches display significant accuracy and/or runtime improvements on real-world datasets. As an example for hierarchical clustering, the proposed pretrained and from-scratch methods match the quality of competing baselines with 30x and 15x runtime reduction, respectively.

Related papers

Hyperbolic Genome Embeddings [0.6656737591902598]
We develop a novel application of hyperbolic CNNs that exploits the evolutionarily-informed structure of biological systems.<n>Our strategy circumvents the need for explicit phylogenetic mapping while discerning key properties of sequences.<n>Our approach even surpasses state-of-the-art performance on seven GUE benchmark datasets.
arXiv Detail & Related papers (2025-07-29T10:06:17Z)
Structural Connectome Harmonization Using Deep Learning: The Strength of Graph Neural Networks [0.9663199711697325]
Small sample sizes in structural connectome (SC) studies limit the development of reliable biomarkers for neurological and psychiatric disorders.<n>Large-scale multi-site studies have exist, but they have acquisition-related biases due to scanner heterogeneity.<n>We propose a site-conditioned deep harmonization framework that harmonizes SCs across diverse acquisition sites without requiring metadata.
arXiv Detail & Related papers (2025-07-18T14:58:05Z)
HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing [4.285436540597423]
We introduce HUG-VAS, a NURBS Generative model for Vascular geometry Synthesis.<n>It integrates NURBS surface parameterization with diffusion-based generative modeling to synthesize realistic, fine-grained aortic geometries.<n>HuG-VAS generates anatomically faithful aortas with supra-aortic branches, yielding biomarker distributions that closely match those of the original dataset.
arXiv Detail & Related papers (2025-07-15T16:45:43Z)
Enhanced High-Dimensional Data Visualization through Adaptive Multi-Scale Manifold Embedding [0.7705234721762716]
We propose an Adaptive Multi-Scale Manifold Embedding (AMSME) algorithm. By introducing ordinal distance, we demonstrate that ordinal distance overcomes the constraints of the curse of dimensionality in high-dimensional spaces. Experimental results demonstrate that AMSME significantly preserves intra-cluster topological structures and improves inter-cluster separation on real-world datasets.
arXiv Detail & Related papers (2025-03-18T06:46:53Z)
RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency [11.813883157319381]
We propose a novel framework that aligns gene and image features using a ranking-based alignment loss. To further enhance the alignment's stability, we employ self-supervised knowledge distillation with a teacher-student network architecture.
arXiv Detail & Related papers (2024-11-22T17:08:28Z)
How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method [11.719027225797037]
We propose a novel textbfContrastive Learning-based textbfLink textbfPrediction model, textbfCLP. Our mymodel consistently outperforms the state-of-the-art models, demonstrating an average improvement of 10.10%, 13.44% in terms of AUC and AP.
arXiv Detail & Related papers (2024-11-01T14:20:53Z)
PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis [1.1619559582563954]
We propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA) PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge.
arXiv Detail & Related papers (2024-09-19T12:53:29Z)
Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction [1.7709249262395883]
This work explores injecting hierarchical prior knowledge into graph neural networks (GNNs) for single-cell multi-class classification of cellular data. We propose our hierarchical plug-in method to be applied to several GNN models, namely, FCHC-GNN, and effectively designed to capture neighborhood information crucial for single-cell FC domain.
arXiv Detail & Related papers (2024-05-28T18:24:16Z)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
Interpolation-based Correlation Reduction Network for Semi-Supervised Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN) In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting. We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z)
UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition [11.81043814295441]
We introduce UNIK, a novel skeleton-based action recognition method that is able to generalize across datasets. To study the cross-domain generalizability of action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK. Results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets.
arXiv Detail & Related papers (2021-07-19T02:00:28Z)
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification. Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.