Related papers: NDAI-NeuroMAP: A Neuroscience-Specific Embedding Model for Domain-Specific Retrieval

NDAI-NeuroMAP: A Neuroscience-Specific Embedding Model for Domain-Specific Retrieval

URL: http://arxiv.org/abs/2507.03329v1
Date: Fri, 04 Jul 2025 06:28:53 GMT
Title: NDAI-NeuroMAP: A Neuroscience-Specific Embedding Model for Domain-Specific Retrieval
Authors: Devendra Patel, Aaditya Jain, Jayant Verma, Divyansh Rajput, Sunil Mahala, Ketki Suresh Khapare, Jayateja Kalla,
Abstract summary: NDAI-NeuroMAP is the first neuroscience-domain-specific dense vector embedding model engineered for high-precision information retrieval tasks.<n>We employ a sophisticated fine-tuning approach utilizing the FremyCompany/BioLORD-2023 foundation model.<n> Comprehensive evaluation on a held-out test dataset comprising approximately 24,000 neuroscience-specific queries demonstrates substantial performance improvements over state-of-the-art general-purpose embedding models.
Score: 1.5705429611931057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present NDAI-NeuroMAP, the first neuroscience-domain-specific dense vector embedding model engineered for high-precision information retrieval tasks. Our methodology encompasses the curation of an extensive domain-specific training corpus comprising 500,000 carefully constructed triplets (query-positive-negative configurations), augmented with 250,000 neuroscience-specific definitional entries and 250,000 structured knowledge-graph triplets derived from authoritative neurological ontologies. We employ a sophisticated fine-tuning approach utilizing the FremyCompany/BioLORD-2023 foundation model, implementing a multi-objective optimization framework combining contrastive learning with triplet-based metric learning paradigms. Comprehensive evaluation on a held-out test dataset comprising approximately 24,000 neuroscience-specific queries demonstrates substantial performance improvements over state-of-the-art general-purpose and biomedical embedding models. These empirical findings underscore the critical importance of domain-specific embedding architectures for neuroscience-oriented RAG systems and related clinical natural language processing applications.

Related papers

Enhancing Omics Cohort Discovery for Research on Neurodegeneration through Ontology-Augmented Embedding Models [0.14999444543328289]
NeuroEmbed is an approach for the engineering of semantically accurate embedding spaces to represent cohorts and samples.<n>The NeuroEmbed method comprises four stages: (1) extraction of cohorts from public repositories; (2) semi-automated normalization and augmentation of metadata of cohorts and samples using biomedical clustering and clustering on the embedding space; (3) automated generation of a natural language question-answering dataset for cohorts and samples based on randomized combinations of standardized metadata dimensions; and (4) fine-tuning of a domain-specific embedder to optimize queries.
arXiv Detail & Related papers (2025-06-16T13:27:10Z)
Towards a general-purpose foundation model for fMRI analysis [58.06455456423138]
We introduce NeuroSTORM, a framework that learns from 4D fMRI volumes and enables efficient knowledge transfer across diverse applications.<n>NeuroSTORM is pre-trained on 28.65 million fMRI frames (>9,000 hours) from over 50,000 subjects across multiple centers and ages 5 to 100.<n>It outperforms existing methods across five tasks: age/gender prediction, phenotype prediction, disease diagnosis, fMRI-to-image retrieval, and task-based fMRI.
arXiv Detail & Related papers (2025-06-11T23:51:01Z)
Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.<n>These models can be adapted to various applications such as question answering and visual understanding.<n>This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z)
LoRKD: Low-Rank Knowledge Decomposition for Medical Foundation Models [59.961172635689664]
"Knowledge Decomposition" aims to improve the performance on specific medical tasks. We propose a novel framework named Low-Rank Knowledge Decomposition (LoRKD) LoRKD explicitly separates gradients from different tasks by incorporating low-rank expert modules and efficient knowledge separation convolution.
arXiv Detail & Related papers (2024-09-29T03:56:21Z)
Machine Learning on Dynamic Functional Connectivity: Promise, Pitfalls, and Interpretations [7.013079422694949]
We seek to establish a well-founded empirical guideline for designing deep models for functional neuroimages. We put the spotlight on (1) What is the current state-of-the-arts (SOTA) performance in cognitive task recognition and disease diagnosis using fMRI? We have conducted a comprehensive evaluation and statistical analysis, in various settings, to answer the above outstanding questions.
arXiv Detail & Related papers (2024-09-17T17:24:17Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z)
RudolfV: A Foundation Model by Pathologists for Pathologists [13.17203220753175]
We present a novel approach to designing foundation models for computational pathology. Our model "RudolfV" surpasses existing state-of-the-art foundation models across different benchmarks.
arXiv Detail & Related papers (2024-01-08T18:31:38Z)
Predicting Infant Brain Connectivity with Federated Multi-Trajectory GNNs using Scarce Data [54.55126643084341]
Existing deep learning solutions suffer from three major limitations. We introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network. Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets.
arXiv Detail & Related papers (2024-01-01T10:20:01Z)
Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies. Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z)
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems [50.076028127394366]
We present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems.<n>NeuroBench is a collaboratively-designed effort from an open community of researchers across industry and academia.
arXiv Detail & Related papers (2023-04-10T15:12:09Z)
Deeper Clinical Document Understanding Using Relation Extraction [0.0]
We propose a text mining framework comprising of Named Entity Recognition (NER) and Relation Extraction (RE) models. We introduce two new RE model architectures -- an accuracy-optimized one based on BioBERT and a speed-optimized one utilizing crafted features over a Fully Connected Neural Network (FCNN) We show two practical applications of this framework -- for building a biomedical knowledge graph and for improving the accuracy of mapping entities to clinical codes.
arXiv Detail & Related papers (2021-12-25T17:14:13Z)
A Dynamic Deep Neural Network For Multimodal Clinical Data Analysis [12.02718865835448]
AdaptiveNet is a novel recurrent neural network architecture, which can deal with multiple lists of different events. We employ the architecture to the problem of disease progression prediction in rheumatoid arthritis using the Swiss Clinical Quality Management registry.
arXiv Detail & Related papers (2020-08-14T11:19:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.