Fugu-MT 論文翻訳(概要): Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models

論文の概要: Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models

arxiv url: http://arxiv.org/abs/2604.20847v1
Date: Tue, 10 Feb 2026 15:24:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-04 02:32:14.10178
Title: Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models
Title（参考訳）: コンテンツに基づく音楽レコメンデーションの再考:大規模音楽モデルからの効率的な特徴集約
Authors: Yizhi Zhou, Jia-Qi Yang, De-Chuan Zhan, Da-Wei Zhou,
Abstract要約: Music Recommendation Systems (MRS)は、現代のストリーミングプラットフォームの基盤である。我々は,音楽レコメンデーションにおけるマルチモーダル情報の役割を強調するために,総合的なデータセットとベンチマークフレームワークであるTASTEを提案する。近年の大規模自己監督型音楽エンコーダの活用により,レコメンデーションタスク間で抽出された音声表現の意義を実証する。
参考スコア（独自算出の注目度）: 54.4270504928356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Music Recommendation Systems (MRSs) are a cornerstone of modern streaming platforms. Existing recommendation models, spanning both recall and ranking stages, predominantly rely on collaborative filtering, which fails to exploit the intrinsic characteristics of audio and consequently leads to suboptimal performance, particularly in cold-start scenarios. However, existing music recommendation datasets often lack rich multimodal information, such as raw audio signals and descriptive textual metadata. Moreover, current recommender system evaluation frameworks remain inadequate, as they neither fully leverage multimodal information nor support a diverse range of algorithms, especially multimodal methods. To address these limitations, we propose TASTE, a comprehensive dataset and benchmarking framework designed to highlight the role of multimodal information in music recommendation. Our dataset integrates both audio and textual modalities. By leveraging recent large-scale self-supervised music encoders, we demonstrate the substantial value of the extracted audio representations across recommendation tasks, including candidate recall and CTR. In addition, we introduce the \textbf{MuQ-token} method, which enables more efficient integration of multi-layer audio features. This method consistently outperforms other feature integration techniques across various settings. Overall, our results not only validate the effectiveness of content-driven approaches but also provide a highly effective and reusable multimodal foundation for future research. Code is available at https://github.com/zreach/TASTE
Abstract（参考訳）: Music Recommendation Systems (MRS)は、現代のストリーミングプラットフォームの基盤である。既存のレコメンデーションモデルは、リコール段階とランキング段階の両方にまたがっており、主に協調フィルタリングに依存しており、これはオーディオの本質的な特性を活用できず、結果として、特にコールドスタートシナリオにおいて、最適以下のパフォーマンスをもたらす。しかし、既存の音楽レコメンデーションデータセットには、生の音声信号や記述的なテキストメタデータなど、豊富なマルチモーダル情報がないことが多い。さらに、マルチモーダル情報を完全に活用したり、様々なアルゴリズム、特にマルチモーダルメソッドをサポートできないため、現在のレコメンデータシステム評価フレームワークは依然として不十分である。これらの制約に対処するため,音楽レコメンデーションにおけるマルチモーダル情報の役割を強調するために,包括的なデータセットとベンチマークフレームワークであるTASTEを提案する。我々のデータセットは音声とテキストの両モードを統合している。近年の大規模自己監督型音楽エンコーダの活用により、候補リコールやCTRを含むレコメンデーションタスクにおいて、抽出された音声表現の実質的な価値を実証する。さらに,マルチレイヤ音声機能のより効率的な統合を可能にする,‘textbf{MuQ-token} メソッドを導入する。この方法は、様々な設定で他の機能統合技術よりも一貫して優れています。全体として,本研究は,コンテンツ駆動型アプローチの有効性を検証するだけでなく,今後の研究に高効率で再利用可能なマルチモーダル基盤を提供する。コードはhttps://github.com/zreach/TASTEで入手できる。

論文の概要: Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models

関連論文リスト