Related papers: Scaling and Distilling Transformer Models for sEMG

Scaling and Distilling Transformer Models for sEMG

URL: http://arxiv.org/abs/2507.22094v1
Date: Tue, 29 Jul 2025 13:41:59 GMT
Title: Scaling and Distilling Transformer Models for sEMG
Authors: Nicholas Mehlman, Jean-Christophe Gagnon-Audet, Michael Shvartsman, Kelvin Niu, Alexander H. Miller, Shagun Sodhani,
Abstract summary: Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces.<n>limited volume of training data and computational constraints during deployment have restricted the investigation of scaling up the model size for solving sEMG tasks.<n>We demonstrate that vanilla transformer models can be effectively scaled up on sEMG data and yield improved cross-user performance up to 110M parameters.
Score: 45.62920901482346
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Surface electromyography (sEMG) signals offer a promising avenue for developing innovative human-computer interfaces by providing insights into muscular activity. However, the limited volume of training data and computational constraints during deployment have restricted the investigation of scaling up the model size for solving sEMG tasks. In this paper, we demonstrate that vanilla transformer models can be effectively scaled up on sEMG data and yield improved cross-user performance up to 110M parameters, surpassing the model size regime investigated in other sEMG research (usually <10M parameters). We show that >100M-parameter models can be effectively distilled into models 50x smaller with minimal loss of performance (<1.5% absolute). This results in efficient and expressive models suitable for complex real-time sEMG tasks in real-world environments.

Related papers

Characterization and Mitigation of Training Instabilities in Microscaling Formats [6.025438902954768]
Training large language models is an expensive, compute-bound process.<n>Next-generation hardware accelerators increasingly support lower-precision arithmetic formats.<n>We investigate the challenges and viability of block-scaled precision formats during model training.
arXiv Detail & Related papers (2025-06-25T18:25:08Z)
PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders [42.8983261737774]
We propose a novel generative model that learns from limited data by incorporating physical constraints to enhance performance.<n>We extend the VAE architecture by incorporating physical models in the generative process, enabling it to capture underlying dynamics more effectively.<n>We demonstrate that PIGPVAE can produce realistic samples beyond the observed distribution, highlighting its robustness and usefulness under distribution shifts.
arXiv Detail & Related papers (2025-05-25T21:12:01Z)
CEReBrO: Compact Encoder for Representations of Brain Oscillations Using Efficient Alternating Attention [53.539020807256904]
We introduce a Compact for Representations of Brain Oscillations using alternating attention (CEReBrO)<n>Our tokenization scheme represents EEG signals at a per-channel patch.<n>We propose an alternating attention mechanism that jointly models intra-channel temporal dynamics and inter-channel spatial correlations, achieving 2x speed improvement with 6x less memory required compared to standard self-attention.
arXiv Detail & Related papers (2025-01-18T21:44:38Z)
T-GMSI: A transformer-based generative model for spatial interpolation under sparse measurements [1.0931557410591526]
We propose a Transformer-based Generative Model for Spatial Interpolation (T-GMSI) using a vision transformer (ViT) architecture for digital elevation model (DEM) generation under sparse conditions.<n>T-GMSI replaces traditional convolution-based methods with ViT for feature extraction and DEM while incorporating a feature-aware loss function for enhanced accuracy.<n>T-GMSI excels in producing high-quality elevation surfaces from datasets with over 70% sparsity and demonstrates strong transferability across diverse landscapes without fine-tuning.
arXiv Detail & Related papers (2024-12-13T06:01:39Z)
Graph Adapter of EEG Foundation Models for Parameter Efficient Fine Tuning [1.8946099300030472]
We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach designed to address these challenges.<n>EGA is integrated into a pre-trained temporal backbone model as a GNN-based module, freezing the backbone and allowing only the adapter to be fine-tuned.<n> Experimental evaluations on two healthcare-related downstream tasks-Major Depressive Disorder (MDD) and Abnormality Detection (TUAB)-show that EGA improves performance by up to 16.1% in F1-score compared with the backbone BENDR model.
arXiv Detail & Related papers (2024-11-25T07:30:52Z)
Graph Convolutional Neural Networks as Surrogate Models for Climate Simulation [0.1884913108327873]
We leverage fully-connected neural networks (FCNNs) and graph convolutional neural networks (GCNNs) to enable rapid simulation and uncertainty quantification. Our surrogate simulated 80 years in approximately 310 seconds on a single A100 GPU, compared to weeks for the ESM model.
arXiv Detail & Related papers (2024-09-19T14:41:15Z)
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection [30.687511115573038]
tool is a novel MoE designed to enhance both the efficacy and efficiency of sparse MoE models. tool can enhance model performance while decreasing the computation load at MoE layers by over 50% without sacrificing performance.
arXiv Detail & Related papers (2024-02-27T08:18:02Z)
Synthetic location trajectory generation using categorical diffusion models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data. We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z)
Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One [83.5162421521224]
We propose a unique method termed E-ARM for training autoregressive generative models. E-ARM takes advantage of a well-designed energy-based learning objective. We show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem.
arXiv Detail & Related papers (2022-06-26T10:58:41Z)
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO) The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.