Related papers: Residual Recurrent CRNN for End-to-End Optical Music Recognition on Monophonic Scores

Residual Recurrent CRNN for End-to-End Optical Music Recognition on Monophonic Scores

URL: http://arxiv.org/abs/2010.13418v2
Date: Wed, 4 Aug 2021 13:18:13 GMT
Title: Residual Recurrent CRNN for End-to-End Optical Music Recognition on Monophonic Scores
Authors: Aozhi Liu, Lipei Zhang, Yaqi Mei, Baoqiang Han, Zifeng Cai, Zhaohua Zhu, Jing Xiao
Abstract summary: We propose an innovative framework that combines a block of Residual Recurrent Convolutional Neural Network with a recurrent-Decoder network. The experiment results are benchmarked against a publicly available dataset called CAMERA-PRIMUS.
Score: 8.829800916216275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the challenges of the Optical Music Recognition task is to transcript the symbols of the camera-captured images into digital music notations. Previous end-to-end model which was developed as a Convolutional Recurrent Neural Network does not explore sufficient contextual information from full scales and there is still a large room for improvement. We propose an innovative framework that combines a block of Residual Recurrent Convolutional Neural Network with a recurrent Encoder-Decoder network to map a sequence of monophonic music symbols corresponding to the notations present in the image. The Residual Recurrent Convolutional block can improve the ability of the model to enrich the context information. The experiment results are benchmarked against a publicly available dataset called CAMERA-PRIMUS, which demonstrates that our approach surpass the state-of-the-art end-to-end method using Convolutional Recurrent Neural Network.

Related papers

$ε$-VAE: Denoising as Visual Decoding [61.29255979767292]
In generative modeling, tokenization simplifies complex data into compact, structured representations, creating a more efficient, learnable space. Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input. We propose denoising as decoding, shifting from single-step reconstruction to iterative refinement. Specifically, we replace the decoder with a diffusion process that iteratively refines noise to recover the original image, guided by the latents provided by the encoder. We evaluate our approach by assessing both reconstruction (rFID) and generation quality (
arXiv Detail & Related papers (2024-10-05T08:27:53Z)
An Image captioning algorithm based on the Hybrid Deep Learning Technique (CNN+GRU) [0.0]
We present a CNN-GRU encoder decode framework for caption-to-image reconstructor. It handles the semantic context into consideration as well as the time complexity. The suggested model outperforms the state-of-the-art LSTM-A5 model for picture captioning in terms of time complexity and accuracy.
arXiv Detail & Related papers (2023-01-06T10:00:06Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z)
Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture. We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN) The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z)
Analysis of Convolutional Decoder for Image Caption Generation [1.2183405753834562]
Convolutional Neural Networks have been proposed for Sequence Modelling tasks such as Image Caption Generation. Unlike Recurrent Neural Network based Decoder, Convolutional Decoder for Image Captioning does not generally benefit from increase in network depth. We observe that Convolutional Decoders show performance comparable with Recurrent Decoders only when trained using sentences of smaller length which contain up to 15 words.
arXiv Detail & Related papers (2021-03-08T17:25:31Z)
Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks. We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z)
Learning Deep Interleaved Networks with Asymmetric Co-Attention for Image Restoration [65.11022516031463]
We present a deep interleaved network (DIN) that learns how information at different states should be combined for high-quality (HQ) images reconstruction. In this paper, we propose asymmetric co-attention (AsyCA) which is attached at each interleaved node to model the feature dependencies. Our presented DIN can be trained end-to-end and applied to various image restoration tasks.
arXiv Detail & Related papers (2020-10-29T15:32:00Z)
Identity Enhanced Residual Image Denoising [61.75610647978973]
We learn a fully-convolutional network model that consists of a Chain of Identity Mapping Modules and residual on the residual architecture for image denoising. The proposed network produces remarkably higher numerical accuracy and better visual image quality than the classical state-of-the-art and CNN algorithms.
arXiv Detail & Related papers (2020-04-26T04:52:22Z)
Deblurring using Analysis-Synthesis Networks Pair [6.09170287691728]
Blind image deblurring remains a challenging problem for modern artificial neural networks. We propose a new architecture which breaks the deblurring network into an analysis network which estimates the blur, and a synthesis network that uses this kernel to deblur the image.
arXiv Detail & Related papers (2020-04-06T19:32:51Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.