In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction
Microphones for In-Ear Sensing Platforms
- URL: http://arxiv.org/abs/2309.02393v1
- Date: Tue, 5 Sep 2023 17:04:09 GMT
- Title: In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction
Microphones for In-Ear Sensing Platforms
- Authors: Philipp Schilk, Niccol\`o Polvani, Andrea Ronco, Milos Cernak, Michele
Magno
- Abstract summary: This paper presents the design and implementation of a custom research platform for low-power wireless earbuds based on novel, commercial, MEMS bone-conduction microphones.
Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications.
- Score: 8.946335367620698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent ubiquitous adoption of remote conferencing has been accompanied by
omnipresent frustration with distorted or otherwise unclear voice
communication. Audio enhancement can compensate for low-quality input signals
from, for example, small true wireless earbuds, by applying noise suppression
techniques. Such processing relies on voice activity detection (VAD) with low
latency and the added capability of discriminating the wearer's voice from
others - a task of significant computational complexity. The tight energy
budget of devices as small as modern earphones, however, requires any system
attempting to tackle this problem to do so with minimal power and processing
overhead, while not relying on speaker-specific voice samples and training due
to usability concerns.
This paper presents the design and implementation of a custom research
platform for low-power wireless earbuds based on novel, commercial, MEMS
bone-conduction microphones. Such microphones can record the wearer's speech
with much greater isolation, enabling personalized voice activity detection and
further audio enhancement applications. Furthermore, the paper accurately
evaluates a proposed low-power personalized speech detection algorithm based on
bone conduction data and a recurrent neural network running on the implemented
research platform. This algorithm is compared to an approach based on
traditional microphone input. The performance of the bone conduction system,
achieving detection of speech within 12.8ms at an accuracy of 95\% is
evaluated. Different SoC choices are contrasted, with the final implementation
based on the cutting-edge Ambiq Apollo 4 Blue SoC achieving 2.64mW average
power consumption at 14uJ per inference, reaching 43h of battery life on a
miniature 32mAh li-ion cell and without duty cycling.
Related papers
- Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege [26.3587130339825]
We propose a novel phoneme-based noise with the idea of informational masking, which can distract both machines and humans.
Our system can reduce the recognition accuracy of recordings to below 50% under all tested speech recognition systems.
arXiv Detail & Related papers (2024-01-28T16:56:56Z) - EchoVest: Real-Time Sound Classification and Depth Perception Expressed
through Transcutaneous Electrical Nerve Stimulation [0.0]
We have developed a new assistive device, EchoVest, for blind/deaf people to intuitively become more aware of their environment.
EchoVest transmits vibrations to the user's body by utilizing transcutaneous electric nerve stimulation (TENS) based on the source of the sounds.
We aimed to outperform CNN-based machine-learning models, the most commonly used machine learning model for classification tasks, in accuracy and computational costs.
arXiv Detail & Related papers (2023-07-10T14:43:32Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - Event Based Time-Vectors for auditory features extraction: a
neuromorphic approach for low power audio recognition [4.206844212918807]
We present a neuromorphic architecture, capable of unsupervised auditory feature recognition.
We then validate the network on a subset of Google's Speech Commands dataset.
arXiv Detail & Related papers (2021-12-13T21:08:04Z) - Reinforcement Learning for Minimizing Age of Information in Real-time
Internet of Things Systems with Realistic Physical Dynamics [158.67956699843168]
This paper studies the problem of minimizing the weighted sum of age of information (AoI) and total energy consumption of Internet of Things (IoT) devices.
A distributed reinforcement learning approach is proposed to optimize the sampling policy.
Simulations with real data of PM 2.5 pollution show that the proposed algorithm can reduce the sum of AoI by up to 17.8% and 33.9%.
arXiv Detail & Related papers (2021-04-04T03:17:26Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - TinySpeech: Attention Condensers for Deep Speech Recognition Neural
Networks on Edge Devices [71.68436132514542]
We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge.
To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
arXiv Detail & Related papers (2020-08-10T16:34:52Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z) - Compact recurrent neural networks for acoustic event detection on
low-energy low-complexity platforms [10.04812789957562]
This paper addresses the application of sound event detection at the edge, by optimizing deep learning techniques on resource-constrained embedded platforms for the IoT.
A two-stage student-teacher approach is presented to make state-of-the-art neural networks for sound event detection fit on current microcontrollers.
Our embedded implementation can achieve 68% accuracy in recognition on Urbansound8k, not far from state-of-the-art performance.
arXiv Detail & Related papers (2020-01-29T14:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.