Related papers: ChatMusician: Understanding and Generating Music Intrinsically with LLM

ChatMusician: Understanding and Generating Music Intrinsically with LLM

URL: http://arxiv.org/abs/2402.16153v1
Date: Sun, 25 Feb 2024 17:19:41 GMT
Title: ChatMusician: Understanding and Generating Music Intrinsically with LLM
Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
Abstract summary: ChatMusician is an open-source Large Language Models (LLMs) that integrates intrinsic musical abilities. It can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc.
Score: 81.48629006702409
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language. ChatMusician can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers. Interestingly, endowing musical abilities does not harm language abilities, even achieving a slightly higher MMLU score. Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc, surpassing GPT-4 baseline. On our meticulously curated college-level music understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs can be an excellent compressor for music, but there remains significant territory to be conquered. We release our 4B token music-language corpora MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.

Related papers

Large Language Models' Internal Perception of Symbolic Music [3.9901365062418317]
Large language models (LLMs) excel at modeling relationships between strings in natural language.<n>This paper investigates how LLMs represent musical concepts by generating symbolic music data from textual prompts.
arXiv Detail & Related papers (2025-07-17T05:48:45Z)
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models [51.03510073676228]
CLaMP 2 is a system compatible with 101 languages for music information retrieval. By leveraging large language models, we obtain refined and consistent multilingual descriptions at scale. CLaMP 2 achieves state-of-the-art results in both multilingual semantic search and music classification across modalities.
arXiv Detail & Related papers (2024-10-17T06:43:54Z)
Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation [31.825105824490464]
Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) to the symbolic music domain. This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing.
arXiv Detail & Related papers (2024-07-31T11:29:46Z)
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations. We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music. Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z)
ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts. We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z)
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation [88.33522730306674]
SongComposer could understand and generate melodies and lyrics in symbolic song representations. We resort to symbolic song representation, the mature and efficient way humans designed for music. With extensive experiments, SongComposer demonstrates superior performance in lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.
arXiv Detail & Related papers (2024-02-27T16:15:28Z)
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response [42.73982391253872]
MusiLingo is a novel system for music caption generation and music-related query responses. We train it on an extensive music caption dataset and fine-tune it with instructional data. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs.
arXiv Detail & Related papers (2023-09-15T19:31:40Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data. MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.