Ensemble Machine Learning Model for Inner Speech Recognition: A Subject-Specific Investigation
- URL: http://arxiv.org/abs/2412.17824v1
- Date: Mon, 09 Dec 2024 16:50:49 GMT
- Title: Ensemble Machine Learning Model for Inner Speech Recognition: A Subject-Specific Investigation
- Authors: Shahamat Mustavi Tasin, Muhammad E. H. Chowdhury, Shona Pedersen, Malek Chabbouh, Diala Bushnaq, Raghad Aljindi, Saidul Kabir, Anwarul Hasan,
- Abstract summary: This study develops a Machine Learning technique to classify inner speech using 128-channel surface EEG signals.<n>The performance of six ML algorithms is evaluated, and an ensemble model is proposed.<n>The proposed framework shows promise in the classification of inner speech using surface EEG signals.
- Score: 0.22198209072577352
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Inner speech recognition has gained enormous interest in recent years due to its applications in rehabilitation, developing assistive technology, and cognitive assessment. However, since language and speech productions are a complex process, for which identifying speech components has remained a challenging task. Different approaches were taken previously to reach this goal, but new approaches remain to be explored. Also, a subject-oriented analysis is necessary to understand the underlying brain dynamics during inner speech production, which can bring novel methods to neurological research. A publicly available dataset, Thinking Out Loud Dataset, has been used to develop a Machine Learning (ML)-based technique to classify inner speech using 128-channel surface EEG signals. The dataset is collected on a Spanish cohort of ten subjects while uttering four words (Arriba, Abajo, Derecha, and Izquierda) by each participant. Statistical methods were employed to detect and remove motion artifacts from the Electroencephalography (EEG) signals. A large number (191 per channel) of time-, frequency- and time-frequency-domain features were extracted. Eight feature selection algorithms are explored, and the best feature selection technique is selected for subsequent evaluations. The performance of six ML algorithms is evaluated, and an ensemble model is proposed. Deep Learning (DL) models are also explored, and the results are compared with the classical ML approach. The proposed ensemble model, by stacking the five best logistic regression models, generated an overall accuracy of 81.13% and an F1 score of 81.12% in the classification of four inner speech words using surface EEG signals. The proposed framework with the proposed ensemble of classical ML models shows promise in the classification of inner speech using surface EEG signals.
Related papers
- Tracking Articulatory Dynamics in Speech with a Fixed-Weight BiLSTM-CNN Architecture [0.0]
This paper presents a novel approach for predicting tongue and lip articulatory features involved in a given speech acoustics.
The proposed network is trained with two datasets consisting of simultaneously recorded speech and Electromagnetic Articulography (EMA) datasets.
arXiv Detail & Related papers (2025-04-25T05:57:22Z) - Enhanced Speech Emotion Recognition with Efficient Channel Attention Guided Deep CNN-BiLSTM Framework [0.7864304771129751]
Speech emotion recognition (SER) is crucial for enhancing affective computing and enriching the domain of human-computer interaction.<n>We propose a lightweight SER architecture that integrates attention-based local feature blocks (ALFBs) to capture high-level relevant feature vectors from speech signals.<n>We also incorporate a global feature block (GFB) technique to capture sequential, global information and long-term dependencies in speech signals.
arXiv Detail & Related papers (2024-12-13T09:55:03Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension [95.8442896569132]
We introduce AIR-Bench, the first benchmark to evaluate the ability of Large Audio-Language Models (LALMs) to understand various types of audio signals and interact with humans in the textual format.
Results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation.
arXiv Detail & Related papers (2024-02-12T15:41:22Z) - ML-ASPA: A Contemplation of Machine Learning-based Acoustic Signal
Processing Analysis for Sounds, & Strains Emerging Technology [0.0]
This inquiry explores recent advancements and transformative potential within the domain of acoustics, specifically focusing on machine learning (ML) and deep learning.
ML adopts a data-driven approach, unveiling intricate relationships between features and desired labels or actions, as well as among features themselves.
The application of ML to expansive sets of training data facilitates the discovery of models elucidating complex acoustic phenomena such as human speech and reverberation.
arXiv Detail & Related papers (2023-12-18T03:04:42Z) - EmoDiarize: Speaker Diarization and Emotion Identification from Speech
Signals using Convolutional Neural Networks [0.0]
This research explores the integration of deep learning techniques in speech emotion recognition.
It introduces a framework that combines a pre-existing speaker diarization pipeline and an emotion identification model built on a Convolutional Neural Network (CNN)
The proposed model yields an unweighted accuracy of 63%, demonstrating remarkable efficiency in accurately identifying emotional states within speech signals.
arXiv Detail & Related papers (2023-10-19T16:02:53Z) - Evaluating raw waveforms with deep learning frameworks for speech
emotion recognition [0.0]
We represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage.
We use six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS.
The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in
arXiv Detail & Related papers (2023-07-06T07:27:59Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Towards Relevance and Sequence Modeling in Language Recognition [39.547398348702025]
We propose a neural network framework utilizing short-sequence information in language recognition.
A new model is proposed for incorporating relevance in language recognition, where parts of speech data are weighted more based on their relevance for the language recognition task.
Experiments are performed using the language recognition task in NIST LRE 2017 Challenge using clean, noisy and multi-speaker speech data.
arXiv Detail & Related papers (2020-04-02T18:31:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.