Related papers: Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

URL: http://arxiv.org/abs/2209.08774v1
Date: Mon, 19 Sep 2022 06:02:37 GMT
Title: Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance
Authors: Dichucheng Li, Yulun Wu, Qinyu Li, Jiahao Zhao, Yi Yu, Fan Xia, Wei Li
Abstract summary: We propose an end-to-end Guzheng playing technique detection system using Fully Convolutional Networks. Our approach achieves 87.97% in frame-level accuracy and 80.76% in note-level F1-score, outperforming existing works by a large margin.
Score: 10.755276589673434
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Guzheng is a kind of traditional Chinese instruments with diverse playing techniques. Instrument playing techniques (IPT) play an important role in musical performance. However, most of the existing works for IPT detection show low efficiency for variable-length audio and provide no assurance in the generalization as they rely on a single sound bank for training and testing. In this study, we propose an end-to-end Guzheng playing technique detection system using Fully Convolutional Networks that can be applied to variable-length audio. Because each Guzheng playing technique is applied to a note, a dedicated onset detector is trained to divide an audio into several notes and its predictions are fused with frame-wise IPT predictions. During fusion, we add the IPT predictions frame by frame inside each note and get the IPT with the highest probability within each note as the final output of that note. We create a new dataset named GZ_IsoTech from multiple sound banks and real-world recordings for Guzheng performance analysis. Our approach achieves 87.97% in frame-level accuracy and 80.76% in note-level F1-score, outperforming existing works by a large margin, which indicates the effectiveness of our proposed method in IPT detection.

Related papers

TapToTab : Video-Based Guitar Tabs Generation using AI and Audio Analysis [0.0]
This paper introduces an advanced approach leveraging deep learning, specifically YOLO models for real-time fretboard detection. Experimental results demonstrate substantial improvements in detection accuracy and robustness compared to traditional techniques. This paper aims to revolutionize guitar instruction by automating the creation of guitar tabs from video recordings.
arXiv Detail & Related papers (2024-09-13T08:17:15Z)
Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats. One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image. We introduce a music object detector based on YOLOv8, which improves detection performance. Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z)
Frequency-Aware Deepfake Detection: Improving Generalizability through Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images. Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries. We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z)
DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z)
MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning [17.307289537499184]
We propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets.
arXiv Detail & Related papers (2023-10-15T15:00:00Z)
Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism [6.2680838592065715]
We formulate a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument. Because different IPTs vary a lot in their lengths, we propose a new method to solve this problem using multi-scale network and self-attention. Our approach outperforms existing works by a large margin, indicating its effectiveness in IPT detection.
arXiv Detail & Related papers (2023-03-23T13:52:42Z)
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions. We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z)
Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features [51.924340387119415]
Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task. Our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems.
arXiv Detail & Related papers (2022-08-02T02:46:16Z)
A Lightweight Instrument-Agnostic Model for Polyphonic Note Transcription and Multipitch Estimation [6.131772929312604]
We propose a lightweight neural network for musical instrument transcription. Our model is trained to jointly predict frame-wise onsets, multipitch and note activations. benchmark results show our system's note estimation to be substantially better than a comparable baseline.
arXiv Detail & Related papers (2022-03-18T12:07:36Z)
TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions. We present an improved input representation, the Tone-CFP, that explicitly groups harmonics. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z)
Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning [49.41766997393417]
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning. The system received the highest evaluation scores, but which of the individual elements most fully contributed to its perfor-mance has not yet been clarified.
arXiv Detail & Related papers (2020-09-24T01:07:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.