Speaker Diaphragm Excursion Prediction: deep attention and online
adaptation
- URL: http://arxiv.org/abs/2305.06640v1
- Date: Thu, 11 May 2023 08:17:55 GMT
- Title: Speaker Diaphragm Excursion Prediction: deep attention and online
adaptation
- Authors: Yuwei Ren, Matt Zivney, Yin Huang, Eddie Choy, Chirag Patel and Hao Xu
- Abstract summary: This paper proposes efficient DL solutions to accurately model and predict the nonlinear excursion.
The proposed algorithm is verified in two speakers and 3 typical deployment scenarios, and $>$99% residual DC is less than 0.1 mm.
- Score: 2.8349018797311314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speaker protection algorithm is to leverage the playback signal properties to
prevent over excursion while maintaining maximum loudness, especially for the
mobile phone with tiny loudspeakers. This paper proposes efficient DL solutions
to accurately model and predict the nonlinear excursion, which is challenging
for conventional solutions. Firstly, we build the experiment and pre-processing
pipeline, where the feedback current and voltage are sampled as input, and
laser is employed to measure the excursion as ground truth. Secondly, one
FFTNet model is proposed to explore the dominant low-frequency and other
unknown harmonics, and compares to a baseline ConvNet model. In addition, BN
re-estimation is designed to explore the online adaptation; and INT8
quantization based on AI Model efficiency toolkit (AIMET\footnote{AIMET is a
product of Qualcomm Innovation Center, Inc.}) is applied to further reduce the
complexity. The proposed algorithm is verified in two speakers and 3 typical
deployment scenarios, and $>$99\% residual DC is less than 0.1 mm, much better
than traditional solutions.
Related papers
- SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning [49.83621156017321]
SimBa is an architecture designed to scale up parameters in deep RL by injecting a simplicity bias.
By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms-including off-policy, on-policy, and unsupervised methods-is consistently improved.
arXiv Detail & Related papers (2024-10-13T07:20:53Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Simple Pooling Front-ends For Efficient Audio Classification [56.59107110017436]
We show that eliminating the temporal redundancy in the input audio features could be an effective approach for efficient audio classification.
We propose a family of simple pooling front-ends (SimPFs) which use simple non-parametric pooling operations to reduce the redundant information.
SimPFs can achieve a reduction in more than half of the number of floating point operations for off-the-shelf audio neural networks.
arXiv Detail & Related papers (2022-10-03T14:00:41Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Neural Calibration for Scalable Beamforming in FDD Massive MIMO with
Implicit Channel Estimation [10.775558382613077]
Channel estimation and beamforming play critical roles in frequency-division duplexing (FDD) massive multiple-input multiple-output (MIMO) systems.
We propose a deep learning-based approach that directly optimize the beamformers at the base station according to the received uplink pilots.
A neural calibration method is proposed to improve the scalability of the end-to-end design.
arXiv Detail & Related papers (2021-08-03T14:26:14Z) - On Minimum Word Error Rate Training of the Hybrid Autoregressive
Transducer [40.63693071222628]
We study the minimum word error rate (MWER) training of Hybrid Autoregressive Transducer (HAT)
From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models.
arXiv Detail & Related papers (2020-10-23T21:16:30Z) - Two-stage Deep Reinforcement Learning for Inverter-based Volt-VAR
Control in Active Distribution Networks [3.260913246106564]
We propose a novel two-stage deep reinforcement learning (DRL) method to improve the voltage profile by regulating inverter-based energy resources.
In the offline stage, a highly efficient adversarial reinforcement learning algorithm is developed to train an offline agent robust to the model mismatch.
In the sequential online stage, we transfer the offline agent safely as the online agent to perform continuous learning and controlling online with significantly improved safety and efficiency.
arXiv Detail & Related papers (2020-05-20T08:02:13Z) - Deliberation Model Based Two-Pass End-to-End Speech Recognition [52.45841282906516]
A two-pass model has been proposed to rescore streamed hypotheses using the non-streaming Listen, Attend and Spell (LAS) model.
The model attends to acoustics to rescore hypotheses, as opposed to a class of neural correction models that use only first-pass text hypotheses.
A bidirectional encoder is used to extract context information from first-pass hypotheses.
arXiv Detail & Related papers (2020-03-17T22:01:12Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z) - NPLDA: A Deep Neural PLDA Model for Speaker Verification [40.842070706362534]
We propose a neural network approach for backend modeling in speaker recognition.
The proposed model, termed as neural PLDA (NPLDA), is optimized using the generative PLDA model parameters.
In experiments, the NPLDA model optimized using the proposed loss function improves significantly over the state-of-art PLDA based speaker verification system.
arXiv Detail & Related papers (2020-02-10T05:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.