Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio
- URL: http://arxiv.org/abs/2211.07951v1
- Date: Tue, 15 Nov 2022 07:32:39 GMT
- Title: Show Me the Instruments: Musical Instrument Retrieval from Mixture Audio
- Authors: Kyungsu Kim, Minju Park, Haesun Joung, Yunkee Chae, Yeongbeom Hong,
Seonghyeon Go and Kyogu Lee
- Abstract summary: We call this task as Musical Instrument Retrieval.
We propose a method for retrieving desired musical instruments using reference music mixture as a query.
The proposed model consists of the Single-Instrument and the Multi-Instrument, both based on convolutional neural networks.
- Score: 11.941510958668557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As digital music production has become mainstream, the selection of
appropriate virtual instruments plays a crucial role in determining the quality
of music. To search the musical instrument samples or virtual instruments that
make one's desired sound, music producers use their ears to listen and compare
each instrument sample in their collection, which is time-consuming and
inefficient. In this paper, we call this task as Musical Instrument Retrieval
and propose a method for retrieving desired musical instruments using reference
music mixture as a query. The proposed model consists of the Single-Instrument
Encoder and the Multi-Instrument Encoder, both based on convolutional neural
networks. The Single-Instrument Encoder is trained to classify the instruments
used in single-track audio, and we take its penultimate layer's activation as
the instrument embedding. The Multi-Instrument Encoder is trained to estimate
multiple instrument embeddings using the instrument embeddings computed by the
Single-Instrument Encoder as a set of target embeddings. For more generalized
training and realistic evaluation, we also propose a new dataset called Nlakh.
Experimental results showed that the Single-Instrument Encoder was able to
learn the mapping from the audio signal of unseen instruments to the instrument
embedding space and the Multi-Instrument Encoder was able to extract multiple
embeddings from the mixture of music and retrieve the desired instruments
successfully. The code used for the experiment and audio samples are available
at: https://github.com/minju0821/musical_instrument_retrieval
Related papers
- Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - Toward Deep Drum Source Separation [52.01259769265708]
We introduce StemGMD, a large-scale audio dataset of isolated single-instrument drum stems.
Totaling 1224 hours, StemGMD is the largest audio dataset of drums to date.
We leverage StemGMD to develop LarsNet, a novel deep drum source separation model.
arXiv Detail & Related papers (2023-12-15T10:23:07Z) - InstrumentGen: Generating Sample-Based Musical Instruments From Text [3.4447129363520337]
We introduce the text-to-instrument task, which aims at generating sample-based musical instruments based on textual prompts.
We propose InstrumentGen, a model that extends a text-prompted generative audio framework to condition on instrument family, source type, pitch (across an 88-key spectrum), velocity, and a joint text/audio embedding.
arXiv Detail & Related papers (2023-11-07T20:45:59Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - iQuery: Instruments as Queries for Audio-Visual Sound Separation [21.327023637480284]
Current audio-visual separation methods share a standard architecture design where an audio encoder-decoder network is fused with visual encoding features at the encoder bottleneck.
We re-formulate visual-sound separation task and propose Instrument as Query (iQuery) with a flexible query expansion mechanism.
Our approach ensures cross-modal consistency and cross-instrument disentanglement.
arXiv Detail & Related papers (2022-12-07T17:55:06Z) - Symphony Generation with Permutation Invariant Language Model [57.75739773758614]
We present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model.
A novel transformer decoder architecture is introduced as backbone for modeling extra-long sequences of symphony tokens.
Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition.
arXiv Detail & Related papers (2022-05-10T13:08:49Z) - Towards Automatic Instrumentation by Learning to Separate Parts in
Symbolic Multitrack Music [33.679951600368405]
We study the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance.
In addition to the online, real-time-capable setting for performative use cases, automatic instrumentation can also find applications in assistive composing tools in an offline setting.
We frame the task of part separation as a sequential multi-class classification problem and adopt machine learning to map sequences of notes into sequences of part labels.
arXiv Detail & Related papers (2021-07-13T08:34:44Z) - Deep Neural Network for Musical Instrument Recognition using MFCCs [0.6445605125467573]
Musical instrument recognition is the task of instrument identification by virtue of its audio.
In this paper, we use an artificial neural network (ANN) model that was trained to perform classification on twenty different classes of musical instruments.
arXiv Detail & Related papers (2021-05-03T15:10:34Z) - Multi-Instrumentalist Net: Unsupervised Generation of Music from Body
Movements [20.627164135805852]
We propose a novel system that takes as an input body movements of a musician playing a musical instrument and generates music in an unsupervised setting.
We build a pipeline named 'Multi-instrumentalistNet' that learns a discrete latent representation of various instruments music from log-spectrogram.
We show that a Midi can further condition the latent space such that the pipeline will generate the exact content of the music being played by the instrument in the video.
arXiv Detail & Related papers (2020-12-07T06:54:10Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.