Jointist: Joint Learning for Multi-instrument Transcription and Its
Applications
- URL: http://arxiv.org/abs/2206.10805v1
- Date: Wed, 22 Jun 2022 02:03:01 GMT
- Title: Jointist: Joint Learning for Multi-instrument Transcription and Its
Applications
- Authors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Amy
Hung, Ju-Chiang Wang, Dorien Herremans
- Abstract summary: Jointist is an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip.
Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano rolls, and the source separation module that utilizes instrument information and transcription results.
- Score: 15.921536323391226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce Jointist, an instrument-aware multi-instrument
framework that is capable of transcribing, recognizing, and separating multiple
musical instruments from an audio clip. Jointist consists of the instrument
recognition module that conditions the other modules: the transcription module
that outputs instrument-specific piano rolls, and the source separation module
that utilizes instrument information and transcription results.
The instrument conditioning is designed for an explicit multi-instrument
functionality while the connection between the transcription and source
separation modules is for better transcription performance. Our challenging
problem formulation makes the model highly useful in the real world given that
modern popular music typically consists of multiple instruments. However, its
novelty necessitates a new perspective on how to evaluate such a model. During
the experiment, we assess the model from various aspects, providing a new
evaluation perspective for multi-instrument transcription. We also argue that
transcription models can be utilized as a preprocessing module for other music
analysis tasks. In the experiment on several downstream tasks, the symbolic
representation provided by our transcription model turned out to be helpful to
spectrograms in solving downbeat detection, chord recognition, and key
estimation.
Related papers
- Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - Multitrack Music Transcription with a Time-Frequency Perceiver [6.617487928813374]
Multitrack music transcription aims to transcribe a music audio input into the musical notes of multiple instruments simultaneously.
We propose a novel deep neural network architecture, Perceiver TF, to model the time-frequency representation of audio input for multitrack transcription.
arXiv Detail & Related papers (2023-06-19T08:58:26Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Transfer of knowledge among instruments in automatic music transcription [2.0305676256390934]
This work shows how to employ easily generated synthesized audio data produced by software synthesizers to train a universal model.
It is a good base for further transfer learning to quickly adapt transcription model for other instruments.
arXiv Detail & Related papers (2023-04-30T08:37:41Z) - Modular Deep Learning [120.36599591042908]
Transfer learning has recently become the dominant paradigm of machine learning.
It remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference.
Modular deep learning has emerged as a promising solution to these challenges.
arXiv Detail & Related papers (2023-02-22T18:11:25Z) - Jointist: Simultaneous Improvement of Multi-instrument Transcription and
Music Source Separation via Joint Training [18.391476887027583]
Jointist is an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip.
Jointist consists of an instrument recognition module that conditions the other two modules: a transcription module that outputs instrument-specific piano rolls, and a source separation module that utilizes instrument information and transcription results.
arXiv Detail & Related papers (2023-02-01T07:35:02Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Towards Automatic Instrumentation by Learning to Separate Parts in
Symbolic Multitrack Music [33.679951600368405]
We study the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance.
In addition to the online, real-time-capable setting for performative use cases, automatic instrumentation can also find applications in assistive composing tools in an offline setting.
We frame the task of part separation as a sequential multi-class classification problem and adopt machine learning to map sequences of notes into sequences of part labels.
arXiv Detail & Related papers (2021-07-13T08:34:44Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.