Related papers: The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics

The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics

URL: http://arxiv.org/abs/2512.14758v1
Date: Mon, 15 Dec 2025 15:04:57 GMT
Title: The Renaissance of Expert Systems: Optical Recognition of Printed Chinese Jianpu Musical Scores with Lyrics
Authors: Fan Bu, Rongfeng Li, Zijin Li, Ya Li, Linfeng Fan, Pei Huang,
Abstract summary: We present a modular expert-system pipeline that converts printed Jianpu scores with lyrics into machine-readable MusicXML and MIDI.<n>The system achieves high-precision recognition on both melody (note-wise F1 = 0.951) and aligned lyrics.
Score: 8.267152843754557
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large-scale optical music recognition (OMR) research has focused mainly on Western staff notation, leaving Chinese Jianpu (numbered notation) and its rich lyric resources underexplored. We present a modular expert-system pipeline that converts printed Jianpu scores with lyrics into machine-readable MusicXML and MIDI, without requiring massive annotated training data. Our approach adopts a top-down expert-system design, leveraging traditional computer-vision techniques (e.g., phrase correlation, skeleton analysis) to capitalize on prior knowledge, while integrating unsupervised deep-learning modules for image feature embeddings. This hybrid strategy strikes a balance between interpretability and accuracy. Evaluated on The Anthology of Chinese Folk Songs, our system massively digitizes (i) a melody-only collection of more than 5,000 songs (> 300,000 notes) and (ii) a curated subset with lyrics comprising over 1,400 songs (> 100,000 notes). The system achieves high-precision recognition on both melody (note-wise F1 = 0.951) and aligned lyrics (character-wise F1 = 0.931).

Related papers

SongSage: A Large Musical Language Model with Lyric Generative Pre-training [69.52790104805794]
SongSage is a large musical language model equipped with diverse lyric-centric intelligence through lyric generative pretraining.<n>SongSage exhibits a strong understanding of lyric-centric knowledge, excels in rewriting user queries for zero-shot playlist recommendations, generates and continues lyrics effectively, and performs proficiently across seven additional capabilities.
arXiv Detail & Related papers (2026-01-03T10:54:37Z)
Musical Score Understanding Benchmark: Evaluating Large Language Models' Comprehension of Complete Musical Scores [32.722200962820125]
We introduce Musical Score Understanding Benchmark (MSU-Bench), the first large-scale, human-curated benchmark for evaluating score-level musical understanding.<n>MSU-Bench comprises 1,800 generative question-answer (QA) pairs drawn from works spanning Bach, Beethoven, Chopin, Debussy, and others.<n>We reveal sharp modality gaps, fragile level-wise success rates, and the difficulty of sustaining multilevel correctness.
arXiv Detail & Related papers (2025-11-24T06:40:38Z)
Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code for Symbolic Music [50.87225308217594]
This paper presents an unsupervised machine learning algorithm that identifies recurring patterns -- referred to as music-words'' -- from symbolic music data.<n>We formulate the task of music-word discovery as a statistical optimization problem and propose a two-stage Expectation-Maximization (EM)-based learning framework.
arXiv Detail & Related papers (2025-09-29T11:10:57Z)
NOTA: Multimodal Music Notation Understanding for Visual Large Language Model [38.26693446133213]
We propose NOTA, the first large-scale comprehensive multimodal music notation dataset.<n>It consists of 1,019,237 records, from 3 regions of the world, and contains 3 tasks.<n>Based on the dataset, we trained NotaGPT, a music notation visual large language model.
arXiv Detail & Related papers (2025-02-17T16:39:19Z)
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models [51.03510073676228]
CLaMP 2 is a system compatible with 101 languages for music information retrieval.<n>By leveraging large language models, we obtain refined and consistent multilingual descriptions at scale.<n>CLaMP 2 achieves state-of-the-art results in both multilingual semantic search and music classification across modalities.
arXiv Detail & Related papers (2024-10-17T06:43:54Z)
Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems [3.5570874721859016]
Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music. We identify two primary sources of distribution shift: the music, and the sound. We evaluate the performance of several SotA AMT systems on two new experimental test sets.
arXiv Detail & Related papers (2024-08-08T19:40:28Z)
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition [82.38021790213752]
SongComposer is a music-specialized large language model (LLM)<n>It integrates the capability of simultaneously composing melodies into LLMs by leveraging three key innovations.<n>It outperforms advanced LLMs in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation.<n>We will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.
arXiv Detail & Related papers (2024-02-27T16:15:28Z)
Multimodal Lyrics-Rhythm Matching [0.0]
We propose a novel multimodal lyrics-rhythm matching approach that specifically matches key components of lyrics and music with each other. We use audio instead of sheet music with readily available metadata, which creates more challenges yet increases the application flexibility of our method. Our experimental results reveal an 0.81 probability of matching on average, and around 30% of the songs have a probability of 0.9 or higher of keywords landing on strong beats.
arXiv Detail & Related papers (2023-01-06T22:24:53Z)
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data. MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z)
Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN) We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.