Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning
- URL: http://arxiv.org/abs/2210.14977v3
- Date: Thu, 11 May 2023 13:54:45 GMT
- Title: Knowledge Transfer For On-Device Speech Emotion Recognition with Neural
Structured Learning
- Authors: Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Bj\"orn W. Schuller
- Abstract summary: Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI)
We propose a neural structured learning (NSL) framework through building synthesized graphs.
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance.
- Score: 19.220263739291685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech emotion recognition (SER) has been a popular research topic in
human-computer interaction (HCI). As edge devices are rapidly springing up,
applying SER to edge devices is promising for a huge number of HCI
applications. Although deep learning has been investigated to improve the
performance of SER by training complex models, the memory space and
computational capability of edge devices represents a constraint for embedding
deep learning models. We propose a neural structured learning (NSL) framework
through building synthesized graphs. An SER model is trained on a source
dataset and used to build graphs on a target dataset. A relatively lightweight
model is then trained with the speech samples and graphs together as the input.
Our experiments demonstrate that training a lightweight SER model on the target
dataset with speech samples and graphs can not only produce small SER models,
but also enhance the model performance compared to models with speech samples
only and those using classic transfer learning strategies.
Related papers
- Battle of the Backbones: A Large-Scale Comparison of Pretrained Models
across Computer Vision Tasks [139.3768582233067]
Battle of the Backbones (BoB) is a benchmarking tool for neural network based computer vision systems.
We find that vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular.
In apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive.
arXiv Detail & Related papers (2023-10-30T18:23:58Z) - Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions.
Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z) - On the Steganographic Capacity of Selected Learning Models [1.0640226829362012]
We consider the question of the steganographic capacity of learning models.
For a wide range of models, we determine the number of low-order bits that can be overwritten.
Of the models tested, the steganographic capacity ranges from 7.04 KB for our LR experiments, to 44.74 MB for InceptionV3.
arXiv Detail & Related papers (2023-08-29T10:41:34Z) - LowDINO -- A Low Parameter Self Supervised Learning Model [0.0]
This research aims to explore the possibility of designing a neural network architecture that allows for small networks to adopt the properties of huge networks.
Previous studies have shown that using convolutional neural networks (ConvNets) can provide inherent inductive bias.
To reduce the number of parameters, attention mechanisms are utilized through the usage of MobileViT blocks.
arXiv Detail & Related papers (2023-05-28T18:34:59Z) - X-TIME: An in-memory engine for accelerating machine learning on tabular
data with CAMs [19.086291506702413]
Modern tree-based Machine Learning models shine in extracting relevant information from structured data.
In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM.
Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU.
arXiv Detail & Related papers (2023-04-03T18:20:31Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data.
Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z) - ESAI: Efficient Split Artificial Intelligence via Early Exiting Using
Neural Architecture Search [6.316693022958222]
Deep neural networks have been outperforming conventional machine learning algorithms in many computer vision-related tasks.
The majority of devices are harnessing the cloud computing methodology in which outstanding deep learning models are responsible for analyzing the data on the server.
In this paper, a new framework for deploying on IoT devices has been proposed which can take advantage of both the cloud and the on-device models.
arXiv Detail & Related papers (2021-06-21T04:47:53Z) - BENDR: using transformers and a contrastive self-supervised learning
task to learn from massive amounts of EEG data [15.71234837305808]
We consider how to adapt techniques and architectures used for language modelling (LM) to encephalography modelling (EM)
We find that a single pre-trained model is capable of modelling completely novel raw EEG sequences recorded with differing hardware.
Both the internal representations of this model and the entire architecture can be fine-tuned to a variety of downstream BCI and EEG classification tasks.
arXiv Detail & Related papers (2021-01-28T14:54:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.