Fugu-MT 論文翻訳(概要): Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks

論文の概要: Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks

arxiv url: http://arxiv.org/abs/2509.14891v1
Date: Thu, 18 Sep 2025 12:10:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.205519
Title: Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
Title（参考訳）: Music4All A+A:音楽情報検索タスクのためのマルチモーダルデータセット
Authors: Jonas Geiger, Marta Moscati, Shah Nawaz, Markus Schedl,
Abstract要約: 音楽は様々なレベルの粒度で説明できる。 Music4All A+Aは、音楽アーティストやアルバムに基づいたマルチモーダルMIRタスクのためのデータセットである。
参考スコア（独自算出の注目度）: 10.492889207034459
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Music is characterized by aspects related to different modalities, such as the audio signal, the lyrics, or the music video clips. This has motivated the development of multimodal datasets and methods for Music Information Retrieval (MIR) tasks such as genre classification or autotagging. Music can be described at different levels of granularity, for instance defining genres at the level of artists or music albums. However, most datasets for multimodal MIR neglect this aspect and provide data at the level of individual music tracks. We aim to fill this gap by providing Music4All Artist and Album (Music4All A+A), a dataset for multimodal MIR tasks based on music artists and albums. Music4All A+A is built on top of the Music4All-Onion dataset, an existing track-level dataset for MIR tasks. Music4All A+A provides metadata, genre labels, image representations, and textual descriptors for 6,741 artists and 19,511 albums. Furthermore, since Music4All A+A is built on top of Music4All-Onion, it allows access to other multimodal data at the track level, including user--item interaction data. This renders Music4All A+A suitable for a broad range of MIR tasks, including multimodal music recommendation, at several levels of granularity. To showcase the use of Music4All A+A, we carry out experiments on multimodal genre classification of artists and albums, including an analysis in missing-modality scenarios, and a quantitative comparison with genre classification in the movie domain. Our experiments show that images are more informative for classifying the genres of artists and albums, and that several multimodal models for genre classification struggle in generalizing across domains. We provide the code to reproduce our experiments at https://github.com/hcai-mms/Music4All-A-A, the dataset is linked in the repository and provided open-source under a CC BY-NC-SA 4.0 license.
Abstract（参考訳）: 音楽は、音声信号、歌詞、音楽ビデオクリップなど、様々なモダリティに関連する側面によって特徴づけられる。このことは、ジャンル分類や自動タグ付けといった音楽情報検索(MIR)タスクのためのマルチモーダルデータセットや手法の開発を動機付けている。音楽は、例えばアーティストや音楽アルバムのレベルでジャンルを定義するなど、様々なレベルの粒度で記述することができる。しかし、マルチモーダルMIRのためのほとんどのデータセットは、この側面を無視し、個々の音楽トラックのレベルでデータを提供する。音楽アーティストやアルバムに基づくマルチモーダルMIRタスクのためのデータセットであるMusic4All Artist and Album(Music4All A+A)を提供することで,このギャップを埋めることを目指している。 Music4All A+Aは、既存のMIRタスク用のトラックレベルのデータセットであるMusic4All-Onionデータセット上に構築されている。 Music4All A+Aは6,741人のアーティストと19,511枚のアルバムにメタデータ、ジャンルラベル、画像表現、テキスト記述を提供する。さらに、Music4All A+AはMusic4All-Onion上に構築されているため、ユーザとイテムのインタラクションデータを含む、トラックレベルでの他のマルチモーダルデータへのアクセスが可能になる。これは、マルチモーダル音楽レコメンデーションを含む幅広いMIRタスクに適したMusic4All A+Aを、様々なレベルの粒度でレンダリングする。 Music4All A+Aの使用を実演するため,アーティストやアルバムのマルチモーダルジャンル分類実験を行い,欠落したモダリティシナリオの分析,映画領域のジャンル分類との比較を行った。実験の結果、画像はアーティストやアルバムのジャンルを分類する上でより有益であることが示され、また、ジャンル分類のための複数のマルチモーダルモデルがドメイン間の一般化に苦慮していることが判明した。我々は、https://github.com/hcai-mms/Music4All-Aで実験を再現するコードを提供し、データセットはリポジトリにリンクされ、CC BY-NC-SA 4.0ライセンスでオープンソースとして提供される。

論文の概要: Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks

関連論文リスト