Related papers: EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation

EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation

URL: http://arxiv.org/abs/2307.04604v1
Date: Mon, 10 Jul 2023 14:43:32 GMT
Title: EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation
Authors: Jesse Choe, Siddhant Sood, Ryan Park
Abstract summary: We have developed a new assistive device, EchoVest, for blind/deaf people to intuitively become more aware of their environment. EchoVest transmits vibrations to the user's body by utilizing transcutaneous electric nerve stimulation (TENS) based on the source of the sounds. We aimed to outperform CNN-based machine-learning models, the most commonly used machine learning model for classification tasks, in accuracy and computational costs.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over 1.5 billion people worldwide live with hearing impairment. Despite various technologies that have been created for individuals with such disabilities, most of these technologies are either extremely expensive or inaccessible for everyday use in low-medium income countries. In order to combat this issue, we have developed a new assistive device, EchoVest, for blind/deaf people to intuitively become more aware of their environment. EchoVest transmits vibrations to the user's body by utilizing transcutaneous electric nerve stimulation (TENS) based on the source of the sounds. EchoVest also provides various features, including sound localization, sound classification, noise reduction, and depth perception. We aimed to outperform CNN-based machine-learning models, the most commonly used machine learning model for classification tasks, in accuracy and computational costs. To do so, we developed and employed a novel audio pipeline that adapts the Audio Spectrogram Transformer (AST) model, an attention-based model, for our sound classification purposes, and Fast Fourier Transforms for noise reduction. The application of Otsu's Method helped us find the optimal thresholds for background noise sound filtering and gave us much greater accuracy. In order to calculate direction and depth accurately, we applied Complex Time Difference of Arrival algorithms and SOTA localization. Our last improvement was to use blind source separation to make our algorithms applicable to multiple microphone inputs. The final algorithm achieved state-of-the-art results on numerous checkpoints, including a 95.7\% accuracy on the ESC-50 dataset for environmental sound classification.

Related papers

AADNet: Exploring EEG Spatiotemporal Information for Fast and Accurate Orientation and Timbre Detection of Auditory Attention Based on A Cue-Masked Paradigm [4.479495549911642]
Auditory attention decoding from electroencephalogram (EEG) could infer to which source the user is attending in noisy environments. This study proposed a cue-masked auditory attention paradigm to avoid information leakage before the experiment. An end-to-end deep learning model, AADNet, was proposed to exploit thetemporal information from the short time window EEG signals.
arXiv Detail & Related papers (2025-01-07T06:51:17Z)
DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs [12.234206036041218]
We use the deep neural network (DNN) DeepSpeech2 as a paradigm to investigate how natural input and cochlear implant-based inputs are processed over time. We generate naturalistic and cochlear implant-like inputs from spoken sentences and test the similarity of model performance to human performance. We find that dynamics over time in each layer are affected by context as well as input type.
arXiv Detail & Related papers (2024-07-30T04:32:27Z)
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z)
Utilizing synthetic training data for the supervised classification of rat ultrasonic vocalizations [0.0]
Murine rodents generate ultrasonic vocalizations (USVs) with frequencies that extend to around 120kHz. These calls are important in social behaviour, and so their analysis can provide insights into the function of vocal communication, and its dysfunction. We compare the detection and classification performance of a trained human against two convolutional neural networks (CNNs), DeepSqueak and VocalMat, on audio containing rat USVs.
arXiv Detail & Related papers (2023-03-03T03:17:45Z)
Walking Noise: On Layer-Specific Robustness of Neural Architectures against Noisy Computations and Associated Characteristic Learning Dynamics [1.5184189132709105]
We discuss the implications of additive, multiplicative and mixed noise for different classification tasks and model architectures. We propose a methodology called Walking Noise which injects layer-specific noise to measure the robustness. We conclude with a discussion of the use of this methodology in practice, among others, discussing its use for tailored multi-execution in noisy environments.
arXiv Detail & Related papers (2022-12-20T17:09:08Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method. We first use wav2vec pre-trained model to obtain a high-level representation of the speech. For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z)
Deep Feature Learning for Medical Acoustics [78.56998585396421]
The purpose of this paper is to compare different learnables in medical acoustics tasks. A framework has been implemented to classify human respiratory sounds and heartbeats in two categories, i.e. healthy or affected by pathologies.
arXiv Detail & Related papers (2022-08-05T10:39:37Z)
End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder. We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z)
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z)
Neural Architecture Search for Energy Efficient Always-on Audio Models [1.3846912186423144]
We present several changes to neural architecture searches (NAS) that improve the chance of success in practical situations. We benchmark the performance of our search on real hardware, but since running thousands of tests with real hardware is difficult we use a random forest model to roughly predict the energy usage of a candidate network. Our search, evaluated on a sound-event classification dataset based upon AudioSet, results in an order of magnitude less energy per inference and a much smaller memory footprint.
arXiv Detail & Related papers (2022-02-09T06:10:18Z)
Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field [1.9899603776429056]
In some sense, the auditory attention score of an individual shows the focus the person can have in auditory tasks. The recent advancements in deep learning and in the non-invasive technologies recording neural activity beg the question, can deep learning along with technologies such as electroencephalography (EEG) be used to predict the auditory attention score of an individual? In this paper, we focus on this very problem of estimating a person's auditory attention level based on their brain's electrical activity captured using 14-channeled EEG signals.
arXiv Detail & Related papers (2021-10-24T17:58:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.