Fugu-MT 論文翻訳(概要): Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

論文の概要: Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

arxiv url: http://arxiv.org/abs/2604.23077v1
Date: Sat, 25 Apr 2026 00:09:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.141735
Title: Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems
Title（参考訳）: 音楽レコメンダシステムにおける事前学習音声表現の活用
Authors: Yan-Martin Tamm, Anna Aljanaki,
Abstract要約: 音楽情報検索(MIR)研究コミュニティは、大量の音楽データに基づいて事前訓練された様々なモデルをリリースした。 Music FM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ, MuQ-MuLanについて検討した。事前学習した音声表現は、従来のMIRタスクと、熱い音楽と冷たい音楽のレコメンデーションの間に大きな性能差があることが判明した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Over the years, Music Information Retrieval (MIR) research community has released various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pretrained backend models for a broad spectrum of downstream tasks, including auto-tagging and genre classification. However, MIR papers generally do not explore the efficiency of pretrained models for Music Recommender Systems (MRS). In addition, the Recommender Systems community tends to favour traditional end-to-end neural network training. Our research addresses this gap and evaluates the performance of nine pretrained backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ and MuQ-MuLan) in the context of MRS. We assess them using five recommendation approaches: K-Nearest Neighbours (KNN), Shallow Neural Network, Contrastive Multi-Modal projection, a Hybrid model, and BERT4Rec both for the hot and cold-start scenarios. Our findings suggest that pretrained audio representations exhibit significant performance disparity between traditional MIR tasks and both hot and cold music recommendations, indicating that valuable aspects of musical information captured by backend models may differ depending on the task. This study establishes a foundation for further exploration of pretrained audio representations to enhance music recommendation systems.
Abstract（参考訳）: 長年にわたり、音楽情報検索(MIR)研究コミュニティは、大量の音楽データに基づいて事前訓練された様々なモデルをリリースしてきた。転送学習は、自動タグ付けやジャンル分類を含む幅広い下流タスクに対して、事前訓練されたバックエンドモデルの実証された効果を示す。しかし、MIR論文は一般に、音楽レコメンダシステム(MRS)の事前訓練されたモデルの効率を探求していない。さらに、Recommender Systemsコミュニティは従来のエンドツーエンドのニューラルネットワークトレーニングを好んでいる。我々は、このギャップに対処し、MRSの文脈における9つの事前訓練されたバックエンドモデル(MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, MusiCNN, MULE, MuQ, MuQ-MuLan)の性能を評価する。以上の結果から,事前学習した音声表現は,従来のMIRタスクと,熱い音楽と冷たい音楽のレコメンデーションとの間に大きな差異があることが示唆された。本研究は,音楽レコメンデーションシステムを強化するために,事前学習した音声表現をさらに探求するための基盤を確立する。

論文の概要: Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

関連論文リスト