Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding
- URL: http://arxiv.org/abs/2507.09513v1
- Date: Sun, 13 Jul 2025 06:43:05 GMT
- Title: Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding
- Authors: Yanchen Wang, Han Yu, Ari Blau, Yizi Zhang, The International Brain Laboratory, Liam Paninski, Cole Hurwitz, Matt Whiteway,
- Abstract summary: BEAST (BEhavioral Analysis via Self-supervised pretraining of Transformers) is a novel framework that pretrains experiment-specific vision transformers for diverse neuro-behavior analyses.<n>Our method establishes a powerful and versatile backbone model that accelerates behavioral analysis in scenarios where labeled data remains scarce.
- Score: 12.25140375320834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The brain can only be fully understood through the lens of the behavior it generates -- a guiding principle in modern neuroscience research that nevertheless presents significant technical challenges. Many studies capture behavior with cameras, but video analysis approaches typically rely on specialized models requiring extensive labeled data. We address this limitation with BEAST (BEhavioral Analysis via Self-supervised pretraining of Transformers), a novel and scalable framework that pretrains experiment-specific vision transformers for diverse neuro-behavior analyses. BEAST combines masked autoencoding with temporal contrastive learning to effectively leverage unlabeled video data. Through comprehensive evaluation across multiple species, we demonstrate improved performance in three critical neuro-behavioral tasks: extracting behavioral features that correlate with neural activity, and pose estimation and action segmentation in both the single- and multi-animal settings. Our method establishes a powerful and versatile backbone model that accelerates behavioral analysis in scenarios where labeled data remains scarce.
Related papers
- NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models [68.89389652724378]
NOBLE is a neural operator framework that learns a mapping from a continuous frequency-modulated embedding of interpretable neuron features to the somatic voltage response induced by current injection.<n>It predicts distributions of neural dynamics accounting for the intrinsic experimental variability.<n>NOBLE is the first scaled-up deep learning framework validated on real experimental data.
arXiv Detail & Related papers (2025-06-05T01:01:18Z) - Neural Encoding and Decoding at Scale [42.33285735011587]
We introduce a multimodal, multi-task model that enables simultaneous Neural and Decoding at Scale (NEDS)<n>Central to our approach is a novel multi-task-masking strategy, which alternates between neural, behavioral, within-modality, and cross-modality masking.<n>NEDS achieves state-of-the-art performance for both encoding and decoding when pretrained on multi-animal data and then fine-tuned on new animals.
arXiv Detail & Related papers (2025-04-11T02:06:20Z) - In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z) - QuantFormer: Learning to Quantize for Neural Activity Forecasting in Mouse Visual Cortex [26.499583552980248]
QuantFormer is a transformer-based model specifically designed for forecasting neural activity from two-photon calcium imaging data.<n> QuantFormer sets a new benchmark in forecasting mouse visual cortex activity.<n>It demonstrates robust performance and generalization across various stimuli and individuals.
arXiv Detail & Related papers (2024-12-10T07:44:35Z) - SuperAnimal pretrained pose estimation models for behavioral analysis [42.206265576708255]
Quantification of behavior is critical in applications ranging from neuroscience, veterinary medicine and animal conservation efforts.
We present a series of technical innovations that enable a new method, collectively called SuperAnimal, to develop unified foundation models.
arXiv Detail & Related papers (2022-03-14T18:46:57Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - Neuronal Learning Analysis using Cycle-Consistent Adversarial Networks [4.874780144224057]
We use a variant of deep generative models called - CycleGAN, to learn the unknown mapping between pre- and post-learning neural activities.
We develop an end-to-end pipeline to preprocess, train and evaluate calcium fluorescence signals, and a procedure to interpret the resulting deep learning models.
arXiv Detail & Related papers (2021-11-25T13:24:19Z) - Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model [124.26611454540813]
Social behaviour analysis of mice is an invaluable tool to assess therapeutic efficacy of neurodegenerative diseases.
Because of the potential to create rich descriptions of mouse social behaviors, the use of multi-view video recordings for rodent observations is increasingly receiving much attention.
We propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures.
arXiv Detail & Related papers (2020-11-04T18:09:58Z) - Investigating naturalistic hand movements by behavior mining in
long-term video and neural recordings [1.7205106391379024]
We describe an automated approach for analyzing simultaneously recorded long-term, naturalistic electrocorticography (ECoG) and naturalistic behavior video data.
We show results from our approach applied to data collected for 12 human subjects over 7--9 days for each subject.
Our pipeline discovers and annotates over 40,000 instances of naturalistic human upper-limb movement events in the behavioral videos.
arXiv Detail & Related papers (2020-01-23T02:41:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.