Defining and Benchmarking a Data-Centric Design Space for Brain Graph Construction
- URL: http://arxiv.org/abs/2508.12533v1
- Date: Sun, 17 Aug 2025 23:53:29 GMT
- Title: Defining and Benchmarking a Data-Centric Design Space for Brain Graph Construction
- Authors: Qinwen Ge, Roza G. Bayrak, Anwar Said, Catie Chang, Xenofon Koutsoukos, Tyler Derr,
- Abstract summary: Current practices often rely on rigid pipelines that overlook critical data-centric choices in how brain graphs are constructed.<n>We adopt a Data-Centric AI perspective and systematically define and benchmark a data-centric design space for brain graph construction.<n>Our contributions lie less in novel components and more in evaluating how combinations of existing and modified techniques influence downstream performance.
- Score: 7.876894803609822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The construction of brain graphs from functional Magnetic Resonance Imaging (fMRI) data plays a crucial role in enabling graph machine learning for neuroimaging. However, current practices often rely on rigid pipelines that overlook critical data-centric choices in how brain graphs are constructed. In this work, we adopt a Data-Centric AI perspective and systematically define and benchmark a data-centric design space for brain graph construction, constrasting with primarily model-centric prior work. We organize this design space into three stages: temporal signal processing, topology extraction, and graph featurization. Our contributions lie less in novel components and more in evaluating how combinations of existing and modified techniques influence downstream performance. Specifically, we study high-amplitude BOLD signal filtering, sparsification and unification strategies for connectivity, alternative correlation metrics, and multi-view node and edge features, such as incorporating lagged dynamics. Experiments on the HCP1200 and ABIDE datasets show that thoughtful data-centric configurations consistently improve classification accuracy over standard pipelines. These findings highlight the critical role of upstream data decisions and underscore the importance of systematically exploring the data-centric design space for graph-based neuroimaging. Our code is available at https://github.com/GeQinwen/DataCentricBrainGraphs.
Related papers
- Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - Dynamic Graph Structure Estimation for Learning Multivariate Point Process using Spiking Neural Networks [14.77536193242342]
Spiking Dynamic Graph Network is a novel framework that leverages the temporal processing capabilities of spiking neural networks (SNNs) and spike-dependent plasticity (STD-P)<n>It adapts to any dataset by learning dynamic-temporal dependencies directly from event data, enhancing generalizability and modeling.<n>Our evaluations conducted on both synthetic and real-world datasets including NYC Taxi, 911 Reddit, and Stack Overflow, demonstrate superior accuracy while maintaining computational efficiency.
arXiv Detail & Related papers (2025-04-01T23:23:10Z) - Performance Heterogeneity in Graph Neural Networks: Lessons for Architecture Design and Preprocessing [1.1126342180866644]
Graph Neural Networks have emerged as the most popular architecture for graph-level learning.<n>We show that good performance in practice requires careful model design.<n>We propose a selective approach, which only targets graphs whose individual performance benefits from rewiring.
arXiv Detail & Related papers (2025-03-01T16:18:07Z) - Spectral Greedy Coresets for Graph Neural Networks [61.24300262316091]
The ubiquity of large-scale graphs in node-classification tasks hinders the real-world applications of Graph Neural Networks (GNNs)
This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs based on their spectral embeddings.
Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs.
arXiv Detail & Related papers (2024-05-27T17:52:12Z) - Balanced Graph Structure Information for Brain Disease Detection [6.799894169098717]
We propose Bargrain, which models two graph structures: filtered correlation matrix and optimal sample graph using graph convolution networks (GCNs)
Based on our extensive experiment, Bargrain outperforms state-of-the-art methods in classification tasks on brain disease datasets, as measured by average F1 scores.
arXiv Detail & Related papers (2023-12-30T06:50:52Z) - NeuroGraph: Benchmarks for Graph Machine Learning in Brain Connectomics [9.803179588247252]
We introduce NeuroGraph, a collection of graph-based neuroimaging datasets.
We demonstrate its utility for predicting multiple categories of behavioral and cognitive traits.
arXiv Detail & Related papers (2023-06-09T19:10:16Z) - Benchmarking Graph Neural Networks for FMRI analysis [0.0]
Graph Neural Networks (GNNs) have emerged as a powerful tool to learn from graph-structured data.
We study and evaluate the performance of five popular GNN architectures in diagnosing major depression disorder and autism spectrum disorder.
We highlight that creating optimal graph structures for functional brain data is a major bottleneck hindering the performance of GNNs.
arXiv Detail & Related papers (2022-11-16T14:16:54Z) - Graph Neural Networks with Trainable Adjacency Matrices for Fault
Diagnosis on Multivariate Sensor Data [69.25738064847175]
It is necessary to consider the behavior of the signals in each sensor separately, to take into account their correlation and hidden relationships with each other.
The graph nodes can be represented as data from the different sensors, and the edges can display the influence of these data on each other.
It was proposed to construct a graph during the training of graph neural network. This allows to train models on data where the dependencies between the sensors are not known in advance.
arXiv Detail & Related papers (2022-10-20T11:03:21Z) - DynDepNet: Learning Time-Varying Dependency Structures from fMRI Data
via Dynamic Graph Structure Learning [58.94034282469377]
We propose DynDepNet, a novel method for learning the optimal time-varying dependency structure of fMRI data induced by downstream prediction tasks.
Experiments on real-world fMRI datasets, for the task of sex classification, demonstrate that DynDepNet achieves state-of-the-art results.
arXiv Detail & Related papers (2022-09-27T16:32:11Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Self-Supervised Graph Representation Learning for Neuronal Morphologies [75.38832711445421]
We present GraphDINO, a data-driven approach to learn low-dimensional representations of 3D neuronal morphologies from unlabeled datasets.
We show, in two different species and across multiple brain areas, that this method yields morphological cell type clusterings on par with manual feature-based classification by experts.
Our method could potentially enable data-driven discovery of novel morphological features and cell types in large-scale datasets.
arXiv Detail & Related papers (2021-12-23T12:17:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.