Large Transformers are Better EEG Learners
- URL: http://arxiv.org/abs/2308.11654v2
- Date: Sat, 13 Apr 2024 05:11:03 GMT
- Title: Large Transformers are Better EEG Learners
- Authors: Bingxin Wang, Xiaowen Fu, Yuan Lan, Luchan Zhang, Wei Zheng, Yang Xiang,
- Abstract summary: AdaCT, plug-and-play Adapters designed for Time series data into 2D pseudo-images or text forms.
AdaCTI transforms multi-channel or lengthy single-channel time series data into pseudo-images for fine-tuning pre-trained vision transformers.
AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers.
- Score: 8.930281191465088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. However, the limited availability of public electroencephalogram (EEG) data presents a unique challenge for extending the success of these models to EEG-based tasks. To address this gap, we propose AdaCT, plug-and-play Adapters designed for Converting Time series data into spatio-temporal 2D pseudo-images or text forms. Essentially, AdaCT-I transforms multi-channel or lengthy single-channel time series data into spatio-temporal 2D pseudo-images for fine-tuning pre-trained vision transformers, while AdaCT-T converts short single-channel data into text for fine-tuning pre-trained language transformers. The proposed approach allows for seamless integration of pre-trained vision models and language models in time series decoding tasks, particularly in EEG data analysis. Experimental results on diverse benchmark datasets, including Epileptic Seizure Recognition, Sleep-EDF, and UCI HAR, demonstrate the superiority of AdaCT over baseline methods. Overall, we provide a promising transfer learning framework for leveraging the capabilities of pre-trained vision and language models in EEG-based tasks, thereby advancing the field of time series decoding and enhancing interpretability in EEG data analysis. Our code will be available at https://github.com/wangbxj1234/AdaCE.
Related papers
- Demystifying the Communication Characteristics for Distributed Transformer Models [2.849208476795592]
This paper examines the communication behavior of transformer models.
We use GPT-based language models as a case study of the transformer architecture due to their ubiquity.
At a high level, our analysis reveals a need to optimize small message point-to-point communication further.
arXiv Detail & Related papers (2024-08-19T17:54:29Z) - Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI [6.926908480247951]
We propose a unified foundation model for EEG called Large Brain Model (LaBraM)
LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches.
We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches.
arXiv Detail & Related papers (2024-05-29T05:08:16Z) - VL-GPT: A Generative Pre-trained Transformer for Vision and Language
Understanding and Generation [79.02357561313785]
We introduce Vision-Language Generative Pre-trained Transformer (VL-GPT), a transformer model proficient at concurrently perceiving and generating visual and linguistic data.
VL-GPT achieves a unified pre-training approach for both image and text modalities by employing a straightforward auto-regressive objective.
arXiv Detail & Related papers (2023-12-14T18:59:43Z) - ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data [0.0]
We demonstrate the application of a hybrid Vision Transformer (ViT) model, pretrained on ImageNet, on an electroencephalogram (EEG) regression task.
This model shows a notable increase in performance compared to other models, including an identical architecture ViT trained without the ImageNet weights.
arXiv Detail & Related papers (2023-08-01T11:10:33Z) - DuETT: Dual Event Time Transformer for Electronic Health Records [14.520791492631114]
We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions.
DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length.
Our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.
arXiv Detail & Related papers (2023-04-25T17:47:48Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - An Empirical Study of Training End-to-End Vision-and-Language
Transformers [50.23532518166621]
We present METER(textbfMultimodal textbfEnd-to-end textbfTransformtextbfER), through which we investigate how to design and pre-train a fully transformer-based VL model.
Specifically, we dissect the model designs along multiple dimensions: vision encoders (e.g., CLIP-ViT, Swin transformer), text encoders (e.g., RoBERTa, DeBERTa), multimodal fusion (e.g., merged attention vs. co-
arXiv Detail & Related papers (2021-11-03T17:55:36Z) - VLDeformer: Learning Visual-Semantic Embeddings by Vision-Language
Transformer Decomposing [7.890230091463883]
Vision-language transformers (VL transformers) have shown impressive accuracy in cross-modal retrieval.
We propose a novel Vision-language Transformer Decomposing (VLDeformer) to modify the VL transformer as an individual encoder for a single image or text.
arXiv Detail & Related papers (2021-10-20T09:00:51Z) - Long-Short Temporal Contrastive Learning of Video Transformers [62.71874976426988]
Self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results on par or better than those obtained with supervised pretraining on large-scale image datasets.
Our approach, named Long-Short Temporal Contrastive Learning, enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.
arXiv Detail & Related papers (2021-06-17T02:30:26Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting.
We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains.
The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.