ChatMusician: Understanding and Generating Music Intrinsically with LLM
- URL: http://arxiv.org/abs/2402.16153v1
- Date: Sun, 25 Feb 2024 17:19:41 GMT
- Title: ChatMusician: Understanding and Generating Music Intrinsically with LLM
- Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao
Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu
Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi,
Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao
Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger
Dannenberg, Wei Xue, Shiyin Kang, Yike Guo
- Abstract summary: ChatMusician is an open-source Large Language Models (LLMs) that integrates intrinsic musical abilities.
It can understand and generate music with a pure text tokenizer without any external multi-modal neural structures or tokenizers.
Our model is capable of composing well-structured, full-length music, conditioned on texts, chords, melodies, motifs, musical forms, etc.
- Score: 81.48629006702409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in
text generation, we find that their ability has yet to be generalized to music,
humanity's creative language. We introduce ChatMusician, an open-source LLM
that integrates intrinsic musical abilities. It is based on continual
pre-training and finetuning LLaMA2 on a text-compatible music representation,
ABC notation, and the music is treated as a second language. ChatMusician can
understand and generate music with a pure text tokenizer without any external
multi-modal neural structures or tokenizers. Interestingly, endowing musical
abilities does not harm language abilities, even achieving a slightly higher
MMLU score. Our model is capable of composing well-structured, full-length
music, conditioned on texts, chords, melodies, motifs, musical forms, etc,
surpassing GPT-4 baseline. On our meticulously curated college-level music
understanding benchmark, MusicTheoryBench, ChatMusician surpasses LLaMA2 and
GPT-3.5 on zero-shot setting by a noticeable margin. Our work reveals that LLMs
can be an excellent compressor for music, but there remains significant
territory to be conquered. We release our 4B token music-language corpora
MusicPile, the collected MusicTheoryBench, code, model and demo in GitHub.
Related papers
- CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models [51.03510073676228]
CLaMP 2 is a system compatible with 101 languages for music information retrieval.
By leveraging large language models, we obtain refined and consistent multilingual descriptions at scale.
CLaMP 2 achieves state-of-the-art results in both multilingual semantic search and music classification across modalities.
arXiv Detail & Related papers (2024-10-17T06:43:54Z) - Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation [31.825105824490464]
Symbolic Music, akin to language, can be encoded in discrete symbols.
Recent research has extended the application of large language models (LLMs) to the symbolic music domain.
This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing.
arXiv Detail & Related papers (2024-07-31T11:29:46Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts.
We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z) - SongComposer: A Large Language Model for Lyric and Melody Composition in
Song Generation [88.33522730306674]
SongComposer could understand and generate melodies and lyrics in symbolic song representations.
We resort to symbolic song representation, the mature and efficient way humans designed for music.
With extensive experiments, SongComposer demonstrates superior performance in lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.
arXiv Detail & Related papers (2024-02-27T16:15:28Z) - MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response [42.73982391253872]
MusiLingo is a novel system for music caption generation and music-related query responses.
We train it on an extensive music caption dataset and fine-tune it with instructional data.
Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs.
arXiv Detail & Related papers (2023-09-15T19:31:40Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.