Related papers: From Minutes to Days: Scaling Intracranial Speech Decoding with Supervised Pretraining

From Minutes to Days: Scaling Intracranial Speech Decoding with Supervised Pretraining

URL: http://arxiv.org/abs/2512.15830v1
Date: Wed, 17 Dec 2025 17:41:55 GMT
Title: From Minutes to Days: Scaling Intracranial Speech Decoding with Supervised Pretraining
Authors: Linnea Evanson, Mingfang, Zhang, Hubert Banville, Saarang Panchavati, Pierre Bourdillon, Jean-Rémi King,
Abstract summary: We introduce a framework to leverage week-long intracranial and audio recordings from patients undergoing clinical monitoring.<n>Our contrastive learning model substantially outperforms models trained solely on classic experimental data.
Score: 25.146772033032764
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Decoding speech from brain activity has typically relied on limited neural recordings collected during short and highly controlled experiments. Here, we introduce a framework to leverage week-long intracranial and audio recordings from patients undergoing clinical monitoring, effectively increasing the training dataset size by over two orders of magnitude. With this pretraining, our contrastive learning model substantially outperforms models trained solely on classic experimental data, with gains that scale log-linearly with dataset size. Analysis of the learned representations reveals that, while brain activity represents speech features, its global structure largely drifts across days, highlighting the need for models that explicitly account for cross-day variability. Overall, our approach opens a scalable path toward decoding and modeling brain representations in both real-life and controlled task settings.

Related papers

Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals [0.0]
We address the problem of decoding visual information from high-density intracortical recordings in primates.<n>We develop a modular generative decoding pipeline that combines low-resolution latent reconstruction with semantically conditioned diffusion.<n>This framework provides principles for brain-computer interfaces and semantic neural decoding.
arXiv Detail & Related papers (2026-01-16T09:10:31Z)
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation [64.36563387033921]
We investigate the use of a pretraining stage based on feature distillation to learn a robust spatial representation of speech without the need for data labels.<n>Our experiments demonstrate that the pretrained models show improved performance in noisy and reverberant environments.
arXiv Detail & Related papers (2025-08-28T15:43:15Z)
Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI. Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels. The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z)
The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning [3.649801602551928]
We develop self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings.<n>Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects.<n>It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive data.
arXiv Detail & Related papers (2024-06-06T17:59:09Z)
TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation [65.65530016765615]
We propose a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives.<n> TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space.<n>We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity.
arXiv Detail & Related papers (2024-05-27T05:45:51Z)
Aligning brain functions boosts the decoding of visual semantics in novel subjects [3.226564454654026]
We propose to boost brain decoding by aligning brain responses to videos and static images across subjects. Our method improves out-of-subject decoding performance by up to 75%. It also outperforms classical single-subject approaches when fewer than 100 minutes of data is available for the tested subject.
arXiv Detail & Related papers (2023-12-11T15:55:20Z)
Deep Learning for real-time neural decoding of grasp [0.0]
We present a Deep Learning-based approach to the decoding of neural signals for grasp type classification. The main goal of the presented approach is to improve over state-of-the-art decoding accuracy without relying on any prior neuroscience knowledge.
arXiv Detail & Related papers (2023-11-02T08:26:29Z)
A Unified, Scalable Framework for Neural Population Decoding [12.052847252465826]
We introduce a training framework and architecture designed to model the population dynamics of neural activity. We construct a large-scale multi-session model trained on large datasets from seven nonhuman primates.
arXiv Detail & Related papers (2023-10-24T17:58:26Z)
Self-supervised models of audio effectively explain human cortical responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system. We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z)
Evaluating deep transfer learning for whole-brain cognitive decoding [11.898286908882561]
Transfer learning (TL) is well-suited to improve the performance of deep learning (DL) models in datasets with small numbers of samples. Here, we evaluate TL for the application of DL models to the decoding of cognitive states from whole-brain functional Magnetic Resonance Imaging (fMRI) data.
arXiv Detail & Related papers (2021-11-01T15:44:49Z)
On the Robustness of Pretraining and Self-Supervision for a Deep Learning-based Analysis of Diabetic Retinopathy [70.71457102672545]
We compare the impact of different training procedures for diabetic retinopathy grading. We investigate different aspects such as quantitative performance, statistics of the learned feature representations, interpretability and robustness to image distortions. Our results indicate that models from ImageNet pretraining report a significant increase in performance, generalization and robustness to image distortions.
arXiv Detail & Related papers (2021-06-25T08:32:45Z)
Deep Recurrent Encoder: A scalable end-to-end network to model brain signals [122.1055193683784]
We propose an end-to-end deep learning architecture trained to predict the brain responses of multiple subjects at once. We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task.
arXiv Detail & Related papers (2021-03-03T11:39:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.