Related papers: The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

URL: http://arxiv.org/abs/2511.21247v1
Date: Wed, 26 Nov 2025 10:23:15 GMT
Title: The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval
Authors: Jaime Garcia-Martinez, David Diaz-Guerra, John Anderson, Ricardo Falcon-Perez, Pablo Cabañas-Molero, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas,
Abstract summary: dataset is composed of over one hour recordings of musical pieces performed by the Colibr Ensemble.<n>Recording setup employed 23 microphones, including close spot, main, and ambient microphones.<n>Room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space.
Score: 6.642819140716501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.

Related papers

SAM Audio: Segment Anything in Audio [55.50609519820557]
General audio source separation is a key capability for multimodal AI systems.<n>We present SAM Audio, a foundation model for general audio separation.<n>It unifies text, visual, and temporal span prompting within a single framework.
arXiv Detail & Related papers (2025-12-19T22:14:23Z)
PianoVAM: A Multimodal Piano Performance Dataset [56.318475235705954]
PianoVAM is a comprehensive piano performance dataset that includes videos, audio, MIDI, hand landmarks, fingering labels, and rich metadata.<n>The dataset was recorded using a Disklavier piano, capturing audio and MIDI from amateur pianists during their daily practice sessions.<n>Hand landmarks and fingering labels were extracted using a pretrained hand pose estimation model and a semi-automated fingering annotation algorithm.
arXiv Detail & Related papers (2025-09-10T17:35:58Z)
Recognizing Ornaments in Vocal Indian Art Music with Active Annotation [2.9219095364935885]
We introduce R=aga Ornamentation Detection (ROD), a novel dataset comprising Indian classical music recordings curated by expert musicians.<n>The dataset is annotated using a custom Human-in-the-Loop tool for six vocal ornaments marked as event-based labels.<n>We develop an ornamentation detection model based on deep time-series analysis, preserving ornament boundaries during the chunking of long audio recordings.
arXiv Detail & Related papers (2025-05-07T13:52:50Z)
Unleashing the Power of Natural Audio Featuring Multiple Sound Sources [54.38251699625379]
Universal sound separation aims to extract clean audio tracks corresponding to distinct events from mixed audio.<n>We propose ClearSep, a framework that employs a data engine to decompose complex naturally mixed audio into multiple independent tracks.<n>In experiments, ClearSep achieves state-of-the-art performance across multiple sound separation tasks.
arXiv Detail & Related papers (2025-04-24T17:58:21Z)
Score-informed Music Source Separation: Improving Synthetic-to-real Generalization in Classical Music [8.468436398420764]
Music source separation is the task of separating a mixture of instruments into constituent tracks.<n>We propose two ways of using musical scores to aid music source separation: a score-informed model and a score-only model.<n>The score-informed model improves separation results compared to a baseline approach, but struggles to generalize from synthetic to real data.
arXiv Detail & Related papers (2025-03-10T14:08:31Z)
Separate This, and All of these Things Around It: Music Source Separation via Hyperellipsoidal Queries [53.30852012059025]
Music source separation is an audio-to-audio retrieval task.<n>Recent work in music source separation has begun to challenge the fixed-stem paradigm.<n>We propose the use of hyperellipsoidal regions as queries to allow for an intuitive yet easily parametrizable approach to specifying both the target (location) and its spread.
arXiv Detail & Related papers (2025-01-27T16:13:50Z)
SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation [7.428668206443388]
We introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic training set.<n>We demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet.
arXiv Detail & Related papers (2024-09-17T08:58:33Z)
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations [17.218899140175697]
COCOLA is a contrastive learning method for musical audio representations that captures the harmonic and rhythmic coherence between samples.<n>Our method operates at the level of the stems composing music tracks and can input features obtained via Harmonic-Percussive Separation (HPS)
arXiv Detail & Related papers (2024-04-25T18:42:25Z)
MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE. It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description. We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z)
Benchmarks and leaderboards for sound demixing tasks [44.99833362998488]
We introduce two new benchmarks for the sound source separation tasks. We compare popular models for sound demixing, as well as their ensembles, on these benchmarks. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem.
arXiv Detail & Related papers (2023-05-12T14:00:26Z)
MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation [10.456845656569444]
Separation of multiple singing voices into each voice is rarely studied in music source separation research. We introduce MedleyVox, an evaluation dataset for multiple singing voices separation. We present a strategy for construction of multiple singing mixtures using various single-singing datasets.
arXiv Detail & Related papers (2022-11-14T12:27:35Z)
Towards Automatic Instrumentation by Learning to Separate Parts in Symbolic Multitrack Music [33.679951600368405]
We study the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance. In addition to the online, real-time-capable setting for performative use cases, automatic instrumentation can also find applications in assistive composing tools in an offline setting. We frame the task of part separation as a sequential multi-class classification problem and adopt machine learning to map sequences of notes into sequences of part labels.
arXiv Detail & Related papers (2021-07-13T08:34:44Z)
Visually Informed Binaural Audio Generation without Binaural Audios [130.80178993441413]
We propose PseudoBinaural, an effective pipeline that is free of recordings. We leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between spatial locations and received audios. Our-recording-free pipeline shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference.
arXiv Detail & Related papers (2021-04-13T13:07:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.