Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression
- URL: http://arxiv.org/abs/2404.15311v2
- Date: Wed, 7 Aug 2024 08:14:56 GMT
- Title: Fusing Pretrained ViTs with TCNet for Enhanced EEG Regression
- Authors: Eric Modesitt, Haicheng Yin, Williams Huang Wang, Brian Lu,
- Abstract summary: This paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression.
Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8.
Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster)
- Score: 0.07999703756441758
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of Electroencephalogram (EEG) analysis is paramount to the development of Brain-Computer Interfaces (BCIs). However, to reach the goal of developing robust, useful BCIs depends heavily on the speed and the accuracy at which BCIs can understand neural dynamics. In response to that goal, this paper details the integration of pre-trained Vision Transformers (ViTs) with Temporal Convolutional Networks (TCNet) to enhance the precision of EEG regression. The core of this approach lies in harnessing the sequential data processing strengths of ViTs along with the superior feature extraction capabilities of TCNet, to significantly improve EEG analysis accuracy. In addition, we analyze the importance of how to construct optimal patches for the attention mechanism to analyze, balancing both speed and accuracy tradeoffs. Our results showcase a substantial improvement in regression accuracy, as evidenced by the reduction of Root Mean Square Error (RMSE) from 55.4 to 51.8 on EEGEyeNet's Absolute Position Task, outperforming existing state-of-the-art models. Without sacrificing performance, we increase the speed of this model by an order of magnitude (up to 4.32x faster). This breakthrough not only sets a new benchmark in EEG regression analysis but also opens new avenues for future research in the integration of transformer architectures with specialized feature extraction methods for diverse EEG datasets.
Related papers
- Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-Processing [0.8192907805418583]
We introduce a novel method, the EEG Deeper Clustered Vision Transformer (EEG-DCViT), which combines depthwise separable convolutional neural networks (CNNs) with vision transformers.
The new approach demonstrates superior performance, establishing a new benchmark with a Root Mean Square Error (RMSE) of 51.6 mm.
arXiv Detail & Related papers (2024-08-06T23:43:03Z) - EEGMobile: Enhancing Speed and Accuracy in EEG-Based Gaze Prediction with Advanced Mobile Architectures [0.36832029288386137]
This study presents a model that leverages a pre-trained MobileViT alongside Knowledge Distillation (KD) for EEG regression tasks.
Our results showcase that this model is capable of performing at a level comparable (only 3% lower) to the previous State-Of-The-Art (SOTA) on the EEGEyeNet Absolute Position Task.
arXiv Detail & Related papers (2024-08-06T21:02:27Z) - An Efficient Approach to Regression Problems with Tensor Neural Networks [5.345144592056051]
This paper introduces a tensor neural network (TNN) to address nonparametric regression problems.
The TNN demonstrates superior performance compared to conventional Feed-Forward Networks (FFN) and Radial Basis Function Networks (RBN)
A significant innovation in our approach is the integration of statistical regression and numerical integration within the TNN framework.
arXiv Detail & Related papers (2024-06-14T03:38:40Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - From Environmental Sound Representation to Robustness of 2D CNN Models
Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network.
We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z) - Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN.
Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z) - FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose.
We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence.
Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z) - When Vision Transformers Outperform ResNets without Pretraining or
Strong Data Augmentations [111.44860506703307]
Vision Transformers (ViTs) and existing VisionNets signal efforts on replacing hand-wired features or inductive throughputs with general-purpose neural architectures.
This paper investigates ViTs and Res-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and inference.
We show that the improved robustness attributes to sparser active neurons in the first few layers.
The resultant ViTs outperform Nets of similar size and smoothness when trained from scratch on ImageNet without large-scale pretraining or strong data augmentations.
arXiv Detail & Related papers (2021-06-03T02:08:03Z) - EEG-Inception: An Accurate and Robust End-to-End Neural Network for
EEG-based Motor Imagery Classification [123.93460670568554]
This paper proposes a novel convolutional neural network (CNN) architecture for accurate and robust EEG-based motor imagery (MI) classification.
The proposed CNN model, namely EEG-Inception, is built on the backbone of the Inception-Time network.
The proposed network is an end-to-end classification, as it takes the raw EEG signals as the input and does not require complex EEG signal-preprocessing.
arXiv Detail & Related papers (2021-01-24T19:03:10Z) - EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded
Motor-Imagery Brain-Machine Interfaces [15.07343602952606]
We propose EEG-TCNet, a novel temporal convolutional network (TCN) that achieves outstanding accuracy while requiring few trainable parameters.
Its low memory footprint and low computational complexity for inference make it suitable for embedded classification on resource-limited devices at the edge.
arXiv Detail & Related papers (2020-05-31T21:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.