Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks
- URL: http://arxiv.org/abs/2504.06165v1
- Date: Tue, 08 Apr 2025 16:01:25 GMT
- Title: Real-Time Pitch/F0 Detection Using Spectrogram Images and Convolutional Neural Networks
- Authors: Xufang Zhao, Omer Tsimhoni,
- Abstract summary: This paper presents a novel approach to detect F0 through Convolutional Neural Networks and image processing techniques.<n>Our new approach demonstrates a very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours.
- Score: 0.7366405857677227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel approach to detect F0 through Convolutional Neural Networks and image processing techniques to directly estimate pitch from spectrogram images. Our new approach demonstrates a very good detection accuracy; a total of 92% of predicted pitch contours have strong or moderate correlations to the true pitch contours. Furthermore, the experimental comparison between our new approach and other state-of-the-art CNN methods reveals that our approach can enhance the detection rate by approximately 5% across various Signal-to-Noise Ratio conditions.
Related papers
- Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep Learning [0.8749675983608171]
This study focuses on event detection in optical fibers, specifically classifying six events using the Phase-OTDR system.<n>A novel approach is introduced to enhance Phase-OTDR data analysis by transforming 1D data into grayscale images.<n>The proposed methodology achieves high classification accuracies of 98.84% and 98.24% with the EfficientNetB0 and DenseNet121 models.
arXiv Detail & Related papers (2025-12-05T15:52:40Z) - GPR-OdomNet: Difference and Similarity-Driven Odometry Estimation Network for Ground Penetrating Radar-Based Localization [10.95813657337033]
This study introduces a new neural network-based odometry method for precise estimation of the Euclidean distances traveled between B-scan images.<n>The experimental results show that our method consistently outperforms state-of-the-art counterparts in all tests.
arXiv Detail & Related papers (2025-11-21T17:59:17Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Enhanced Wavelet Scattering Network for image inpainting detection [0.0]
This paper proposes several innovative ideas for detecting inpainting forgeries based on low level noise analysis.
It combines Dual-Tree Complex Wavelet Transform (DT-CWT) for feature extraction with convolutional neural networks (CNN) for forged area detection and localization.
Our approach was benchmarked against state-of-the-art methods and demonstrated superior performance over all cited alternatives.
arXiv Detail & Related papers (2024-09-25T15:27:05Z) - Verification of Geometric Robustness of Neural Networks via Piecewise Linear Approximation and Lipschitz Optimisation [57.10353686244835]
We address the problem of verifying neural networks against geometric transformations of the input image, including rotation, scaling, shearing, and translation.
The proposed method computes provably sound piecewise linear constraints for the pixel values by using sampling and linear approximations in combination with branch-and-bound Lipschitz.
We show that our proposed implementation resolves up to 32% more verification cases than present approaches.
arXiv Detail & Related papers (2024-08-23T15:02:09Z) - Direct Zernike Coefficient Prediction from Point Spread Functions and Extended Images using Deep Learning [36.136619420474766]
Existing adaptive optics systems rely on iterative search algorithm to correct for aberrations and improve images.
This study demonstrates the application of convolutional neural networks to characterise the optical aberration.
arXiv Detail & Related papers (2024-04-23T17:03:53Z) - ReNoise: Real Image Inversion Through Iterative Noising [62.96073631599749]
We introduce an inversion method with a high quality-to-operation ratio, enhancing reconstruction accuracy without increasing the number of operations.
We evaluate the performance of our ReNoise technique using various sampling algorithms and models, including recent accelerated diffusion models.
arXiv Detail & Related papers (2024-03-21T17:52:08Z) - LUT-GCE: Lookup Table Global Curve Estimation for Fast Low-light Image
Enhancement [62.17015413594777]
We present an effective and efficient approach for low-light image enhancement, named LUT-GCE.
We estimate a global curve for the entire image that allows corrections for both under- and over-exposure.
Our approach outperforms the state of the art in terms of inference speed, especially on high-definition images (e.g., 1080p and 4k)
arXiv Detail & Related papers (2023-06-12T12:53:06Z) - Explicit Correspondence Matching for Generalizable Neural Radiance
Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views.
The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views.
Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z) - Decoupled Mixup for Generalized Visual Recognition [71.13734761715472]
We propose a novel "Decoupled-Mixup" method to train CNN models for visual recognition.
Our method decouples each image into discriminative and noise-prone regions, and then heterogeneously combines these regions to train CNN models.
Experiment results show the high generalization performance of our method on testing data that are composed of unseen contexts.
arXiv Detail & Related papers (2022-10-26T15:21:39Z) - Understanding of the properties of neural network approaches for
transient light curve approximations [37.91290708320157]
This paper presents a search for the best-performing methods to approximate the observed light curves over time and wavelength.
Test datasets include simulated PLAsTiCC and real Zwicky Transient Facility Bright Transient Survey light curves of transients.
arXiv Detail & Related papers (2022-09-15T18:00:08Z) - Edge Detection and Deep Learning Based SETI Signal Classification Method [0.0]
Scientists at the Berkeley SETI Research Center are Searching for Extraterrestrial Intelligence (SETI)
New signal detection method converts radio signals into spectrograms through Fourier transforms and classifies signals represented by two-dimensional time-frequency spectrums.
In view of the negative impact of background noises on the accuracy of spectrograms classification, a new method is introduced in this paper.
arXiv Detail & Related papers (2022-03-29T04:31:48Z) - Conditional Variational Autoencoder for Learned Image Reconstruction [5.487951901731039]
We develop a novel framework that approximates the posterior distribution of the unknown image at each query observation.
It handles implicit noise models and priors, it incorporates the data formation process (i.e., the forward operator), and the learned reconstructive properties are transferable between different datasets.
arXiv Detail & Related papers (2021-10-22T10:02:48Z) - Lightweight Convolutional Neural Network with Gaussian-based Grasping
Representation for Robotic Grasping Detection [4.683939045230724]
Current object detectors are difficult to strike a balance between high accuracy and fast inference speed.
We present an efficient and robust fully convolutional neural network model to perform robotic grasping pose estimation.
The network is an order of magnitude smaller than other excellent algorithms.
arXiv Detail & Related papers (2021-01-25T16:36:53Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.