Training Data Governance for Brain Foundation Models
- URL: http://arxiv.org/abs/2602.02511v1
- Date: Fri, 23 Jan 2026 02:15:58 GMT
- Title: Training Data Governance for Brain Foundation Models
- Authors: Margot Hanley, Jiunn-Tyng Yeh, Ryan Rodriguez, Jack Pilkington, Nita Farahany,
- Abstract summary: This paper contends that training foundation models on neural data opens new normative territory.<n>We first describe brain foundation models' technical foundations and training-data ecosystem.<n>We then draw on AI ethics, neuroethics, and bioethics to organize concerns across privacy, consent, bias, benefit sharing, and governance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Brain foundation models bring the foundation model paradigm to the field of neuroscience. Like language and image foundation models, they are general-purpose AI systems pretrained on large-scale datasets that adapt readily to downstream tasks. Unlike text-and-image based models, however, they train on brain data: large-datasets of EEG, fMRI, and other neural data types historically collected within tightly governed clinical and research settings. This paper contends that training foundation models on neural data opens new normative territory. Neural data carry stronger expectations of, and claims to, protection than text or images, given their body-derived nature and historical governance within clinical and research settings. Yet the foundation model paradigm subjects them to practices of large-scale repurposing, cross-context stitching, and open-ended downstream application. Furthermore, these practices are now accessible to a much broader range of actors, including commercial developers, against a backdrop of fragmented and unclear governance. To map this territory, we first describe brain foundation models' technical foundations and training-data ecosystem. We then draw on AI ethics, neuroethics, and bioethics to organize concerns across privacy, consent, bias, benefit sharing, and governance. For each, we propose both agenda-setting questions and baseline safeguards as the field matures.
Related papers
- A New Strategy for Artificial Intelligence: Training Foundation Models Directly on Human Brain Data [0.0]
We explore a new strategy for artificial intelligence: moving beyond surface-level statistical regularities by training foundation models directly on human brain data.<n>In this paper, we classify the current limitations of foundation models, as well as the promising brain regions and cognitive processes that could be leveraged to address them.<n>We also discuss the potential implications for agents, artificial general intelligence, and artificial superintelligence, as well as the ethical, social, and technical challenges and opportunities.
arXiv Detail & Related papers (2026-01-17T13:38:51Z) - Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging - A Study on Pretraining for Age Prediction [3.6421420152854638]
We systematically explore the role of data quality level in pretraining and its impact on downstream tasks.<n>Specifically, we perform pretraining on datasets with different quality levels and perform fine-tuning for brain age prediction on external cohorts.<n>Our results show significant performance differences across quality levels, revealing both opportunities and limitations.
arXiv Detail & Related papers (2025-12-01T06:11:31Z) - A Survey on Generative Recommendation: Data, Model, and Tasks [55.36322811257545]
generative recommendation reconceptualizes recommendation as a generation task rather than discriminative scoring.<n>This survey provides a comprehensive examination through a unified tripartite framework spanning data, model, and task dimensions.<n>We identify five key advantages: world knowledge integration, natural language understanding, reasoning capabilities, scaling laws, and creative generation.
arXiv Detail & Related papers (2025-10-31T04:02:58Z) - Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.<n>These models can be adapted to various applications such as question answering and visual understanding.<n>This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z) - Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery [20.558821847407895]
Brain foundation models (BFMs) have emerged as a transformative paradigm in computational neuroscience.<n>BFMs leverage large-scale pre-training techniques, allowing them to generalize effectively across multiple scenarios, tasks, and modalities.<n>In this survey, we define BFMs for the first time, providing a clear and concise framework for constructing and utilizing these models in various applications.
arXiv Detail & Related papers (2025-03-01T18:12:50Z) - Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - BrainSegFounder: Towards 3D Foundation Models for Neuroimage Segmentation [6.5388528484686885]
This study introduces a novel approach towards the creation of medical foundation models.
Our method involves a novel two-stage pretraining approach using vision transformers.
BrainFounder demonstrates a significant performance gain, surpassing the achievements of previous winning solutions.
arXiv Detail & Related papers (2024-06-14T19:49:45Z) - A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations.
We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images.
Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - OpenMEDLab: An Open-source Platform for Multi-modality Foundation Models
in Medicine [55.29668193415034]
We present OpenMEDLab, an open-source platform for multi-modality foundation models.
It encapsulates solutions of pioneering attempts in prompting and fine-tuning large language and vision models for frontline clinical and bioinformatic applications.
It opens access to a group of pre-trained foundation models for various medical image modalities, clinical text, protein engineering, etc.
arXiv Detail & Related papers (2024-02-28T03:51:02Z) - Foundational Models in Medical Imaging: A Comprehensive Survey and
Future Vision [6.2847894163744105]
Foundation models are large-scale, pre-trained deep-learning models adapted to a wide range of downstream tasks.
These models facilitate contextual reasoning, generalization, and prompt capabilities at test time.
Capitalizing on the advances in computer vision, medical imaging has also marked a growing interest in these models.
arXiv Detail & Related papers (2023-10-28T12:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.