Neurobench: DCASE 2020 Acoustic Scene Classification benchmark on XyloAudio 2
- URL: http://arxiv.org/abs/2410.23776v1
- Date: Thu, 31 Oct 2024 09:48:12 GMT
- Title: Neurobench: DCASE 2020 Acoustic Scene Classification benchmark on XyloAudio 2
- Authors: Weijie Ke, Mina Khoei, Dylan Muir,
- Abstract summary: XyloAudio is a line of ultra-low-power audio inference chips.
It is designed for in- and near-microphone analysis of audio in real-time energy-constrained scenarios.
- Score: 0.06752396542927405
- License:
- Abstract: XyloAudio is a line of ultra-low-power audio inference chips, designed for in- and near-microphone analysis of audio in real-time energy-constrained scenarios. Xylo is designed around a highly efficient integer-logic processor which simulates parameter- and activity-sparse spiking neural networks (SNNs) using a leaky integrate-and-fire (LIF) neuron model. Neurons on Xylo are quantised integer devices operating in synchronous digital CMOS, with neuron and synapse state quantised to 16 bit, and weight parameters quantised to 8 bit. Xylo is tailored for real-time streaming operation, as opposed to accelerated-time operation in the case of an inference accelerator. XyloAudio includes a low-power audio encoding interface for direct connection to a microphone, designed for sparse encoding of incident audio for further processing by the inference core. In this report we present the results of DCASE 2020 acoustic scene classification audio benchmark dataset deployed to XyloAudio 2. We describe the benchmark dataset; the audio preprocessing approach; and the network architecture and training approach. We present the performance of the trained model, and the results of power and latency measurements performed on the XyloAudio 2 development kit. This benchmark is conducted as part of the Neurobench project.
Related papers
- AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND
Audio-Based-Interaction-Recognition Challenge 2023 [5.0169092839789275]
This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge.
The task is to learn the mapping from audio samples to their corresponding action labels.
Our approach achieved 55.43% of top-1 accuracy on the challenge test set, ranked as 1st on the public leaderboard.
arXiv Detail & Related papers (2023-07-14T10:39:05Z) - Large-scale unsupervised audio pre-training for video-to-speech
synthesis [64.86087257004883]
Video-to-speech synthesis is the task of reconstructing the speech signal from a silent video of a speaker.
In this paper we propose to train encoder-decoder models on more than 3,500 hours of audio data at 24kHz.
We then use the pre-trained decoders to initialize the audio decoders for the video-to-speech synthesis task.
arXiv Detail & Related papers (2023-06-27T13:31:33Z) - Audio Tagging on an Embedded Hardware Platform [20.028643659869573]
We analyze how the performance of large-scale pretrained audio neural networks changes when deployed on a hardware such as Raspberry Pi.
Our experiments reveal that the continuous CPU usage results in an increased temperature that can trigger an automated slowdown mechanism.
The quality of a microphone, specifically with affordable devices like the Google AIY Voice Kit, and audio signal volume, all affect the system performance.
arXiv Detail & Related papers (2023-06-15T13:02:41Z) - Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
Models [65.18102159618631]
multimodal generative modeling has created milestones in text-to-image and text-to-video generation.
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
We propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps.
arXiv Detail & Related papers (2023-01-30T04:44:34Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech
Synthesis [77.06890315052563]
We propose FastLTS, a non-autoregressive end-to-end model which can directly synthesize high-quality speech audios from unconstrained talking videos with low latency.
Experiments show that our model achieves $19.76times$ speedup for audio generation compared with the current autoregressive model on input sequences of 3 seconds.
arXiv Detail & Related papers (2022-07-08T10:10:39Z) - R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS [1.8927791081850118]
This paper introduces R-MelNet, a two-part autoregressive architecture with a backend WaveRNN-style audio decoder.
The model produces low-resolution mel-spectral features which are used by a WaveRNN decoder to produce an audio waveform.
arXiv Detail & Related papers (2022-06-30T13:29:31Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - Audio Dequantization for High Fidelity Audio Generation in Flow-based
Neural Vocoder [29.63675159839434]
Flow-based neural vocoder has shown significant improvement in real-time speech generation task.
We propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation.
arXiv Detail & Related papers (2020-08-16T09:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.