A Deep Learning Approach for Low-Latency Packet Loss Concealment of
Audio Signals in Networked Music Performance Applications
- URL: http://arxiv.org/abs/2007.07132v1
- Date: Tue, 14 Jul 2020 15:51:52 GMT
- Title: A Deep Learning Approach for Low-Latency Packet Loss Concealment of
Audio Signals in Networked Music Performance Applications
- Authors: Prateek Verma, Alessandro Ilic Mezza, Chris Chafe, Cristina Rottondi
- Abstract summary: Networked Music Performance (NMP) is envisioned as a potential game changer among Internet applications.
This article describes a technique for predicting lost packet content in real-time using a deep learning approach.
- Score: 66.56753488329096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Networked Music Performance (NMP) is envisioned as a potential game changer
among Internet applications: it aims at revolutionizing the traditional concept
of musical interaction by enabling remote musicians to interact and perform
together through a telecommunication network. Ensuring realistic conditions for
music performance, however, constitutes a significant engineering challenge due
to extremely strict requirements in terms of audio quality and, most
importantly, network delay. To minimize the end-to-end delay experienced by the
musicians, typical implementations of NMP applications use un-compressed,
bidirectional audio streams and leverage UDP as transport protocol. Being
connection less and unreliable,audio packets transmitted via UDP which become
lost in transit are not re-transmitted and thus cause glitches in the receiver
audio playout. This article describes a technique for predicting lost packet
content in real-time using a deep learning approach. The ability of concealing
errors in real time can help mitigate audio impairments caused by packet
losses, thus improving the quality of audio playout in real-world scenarios.
Related papers
- FM Tone Transfer with Envelope Learning [8.771755521263811]
Tone Transfer is a novel technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content.
It presents several shortcomings related to poor sound diversity, and limited transient and dynamic rendering, which we believe hinder its possibilities of articulation and phrasing in a real-time performance context.
arXiv Detail & Related papers (2023-10-07T14:03:25Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - End-to-End Neural Audio Coding for Real-Time Communications [22.699018098484707]
This paper proposes the TFNet, an end-to-end neural audio system with low latency for real-time communications (RTC)
An interleaved structure is proposed for temporal filtering to capture both short-term and long-term temporal dependencies.
With end-to-end optimization, the TFNet is jointly optimized with speech enhancement and packet loss concealment, yielding a one-for-all network for three tasks.
arXiv Detail & Related papers (2022-01-24T03:06:30Z) - Accelerating Federated Edge Learning via Optimized Probabilistic Device
Scheduling [57.271494741212166]
This paper formulates and solves the communication time minimization problem.
It is found that the optimized policy gradually turns its priority from suppressing the remaining communication rounds to reducing per-round latency as the training process evolves.
The effectiveness of the proposed scheme is demonstrated via a use case on collaborative 3D objective detection in autonomous driving.
arXiv Detail & Related papers (2021-07-24T11:39:17Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Dynamic Compression Ratio Selection for Edge Inference Systems with Hard
Deadlines [9.585931043664363]
We propose a dynamic compression ratio selection scheme for edge inference system with hard deadlines.
Information augmentation that retransmits less compressed data of task with erroneous inference is proposed to enhance the accuracy performance.
Considering the wireless transmission errors, we further design a retransmission scheme to reduce performance degradation due to packet losses.
arXiv Detail & Related papers (2020-05-25T17:11:53Z) - ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in
Deep Speech Emotion Recognition [0.0]
Packet loss is a common problem in data transmission, including speech data transmission.
In this paper, we present a concealment wrapper, which can be used with stacked recurrent neural cells.
The proposed ConcealNet model has shown considerable improvement, for both audio reconstruction and the corresponding emotion prediction.
arXiv Detail & Related papers (2020-05-15T20:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.