Playing Technique Detection by Fusing Note Onset Information in Guzheng
Performance
- URL: http://arxiv.org/abs/2209.08774v1
- Date: Mon, 19 Sep 2022 06:02:37 GMT
- Title: Playing Technique Detection by Fusing Note Onset Information in Guzheng
Performance
- Authors: Dichucheng Li, Yulun Wu, Qinyu Li, Jiahao Zhao, Yi Yu, Fan Xia, Wei Li
- Abstract summary: We propose an end-to-end Guzheng playing technique detection system using Fully Convolutional Networks.
Our approach achieves 87.97% in frame-level accuracy and 80.76% in note-level F1-score, outperforming existing works by a large margin.
- Score: 10.755276589673434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Guzheng is a kind of traditional Chinese instruments with diverse playing
techniques. Instrument playing techniques (IPT) play an important role in
musical performance. However, most of the existing works for IPT detection show
low efficiency for variable-length audio and provide no assurance in the
generalization as they rely on a single sound bank for training and testing. In
this study, we propose an end-to-end Guzheng playing technique detection system
using Fully Convolutional Networks that can be applied to variable-length
audio. Because each Guzheng playing technique is applied to a note, a dedicated
onset detector is trained to divide an audio into several notes and its
predictions are fused with frame-wise IPT predictions. During fusion, we add
the IPT predictions frame by frame inside each note and get the IPT with the
highest probability within each note as the final output of that note. We
create a new dataset named GZ_IsoTech from multiple sound banks and real-world
recordings for Guzheng performance analysis. Our approach achieves 87.97% in
frame-level accuracy and 80.76% in note-level F1-score, outperforming existing
works by a large margin, which indicates the effectiveness of our proposed
method in IPT detection.
Related papers
- TapToTab : Video-Based Guitar Tabs Generation using AI and Audio Analysis [0.0]
This paper introduces an advanced approach leveraging deep learning, specifically YOLO models for real-time fretboard detection.
Experimental results demonstrate substantial improvements in detection accuracy and robustness compared to traditional techniques.
This paper aims to revolutionize guitar instruction by automating the creation of guitar tabs from video recordings.
arXiv Detail & Related papers (2024-09-13T08:17:15Z) - Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - MERTech: Instrument Playing Technique Detection Using Self-Supervised
Pretrained Model With Multi-Task Finetuning [17.307289537499184]
We propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks.
Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets.
arXiv Detail & Related papers (2023-10-15T15:00:00Z) - Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale
Network and Self-Attention Mechanism [6.2680838592065715]
We formulate a frame-level multi-label classification problem and apply it to Guzheng, a Chinese plucked string instrument.
Because different IPTs vary a lot in their lengths, we propose a new method to solve this problem using multi-scale network and self-attention.
Our approach outperforms existing works by a large margin, indicating its effectiveness in IPT detection.
arXiv Detail & Related papers (2023-03-23T13:52:42Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Audio Deepfake Detection Based on a Combination of F0 Information and
Real Plus Imaginary Spectrogram Features [51.924340387119415]
Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task.
Our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems.
arXiv Detail & Related papers (2022-08-02T02:46:16Z) - A Lightweight Instrument-Agnostic Model for Polyphonic Note
Transcription and Multipitch Estimation [6.131772929312604]
We propose a lightweight neural network for musical instrument transcription.
Our model is trained to jointly predict frame-wise onsets, multipitch and note activations.
benchmark results show our system's note estimation to be substantially better than a comparable baseline.
arXiv Detail & Related papers (2022-03-18T12:07:36Z) - TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic
Music [43.17623332544677]
TONet is a plug-and-play model that improves both tone and octave perceptions.
We present an improved input representation, the Tone-CFP, that explicitly groups harmonics.
Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.
arXiv Detail & Related papers (2022-02-02T10:55:48Z) - Effects of Word-frequency based Pre- and Post- Processings for Audio
Captioning [49.41766997393417]
The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.
The system received the highest evaluation scores, but which of the individual elements most fully contributed to its perfor-mance has not yet been clarified.
arXiv Detail & Related papers (2020-09-24T01:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.