SoundPlot: An Open-Source Framework for Birdsong Acoustic Analysis and Neural Synthesis with Interactive 3D Visualization
- URL: http://arxiv.org/abs/2601.12752v1
- Date: Mon, 19 Jan 2026 06:17:26 GMT
- Title: SoundPlot: An Open-Source Framework for Birdsong Acoustic Analysis and Neural Synthesis with Interactive 3D Visualization
- Authors: Naqcho Ali Mehdi, Mohammad Adeel, Aizaz Ali Larik,
- Abstract summary: We present SoundPlot, an open-source framework for analyzing avian vocalizations.<n>System transforms audio signals into a multi-dimensional acoustic feature space.<n>SoundPlot is released under the MIT License to facilitate research in bioacoustics, audio signal processing, and computational ethology.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present SoundPlot, an open-source framework for analyzing avian vocalizations through acoustic feature extraction, dimensionality reduction, and neural audio synthesis. The system transforms audio signals into a multi-dimensional acoustic feature space, enabling real-time visualization of temporal dynamics in 3D using web-based interactive graphics. Our framework implements a complete analysis-synthesis pipeline that extracts spectral features (centroid, bandwidth, contrast), pitch contours via probabilistic YIN (pYIN), and mel-frequency cepstral coefficients (MFCCs), mapping them to a unified timbre space for visualization. Audio reconstruction employs the Griffin-Lim phase estimation algorithm applied to mel spectrograms. The accompanying Three.js-based interface provides dual-viewport visualization comparing original and synthesized audio trajectories with independent playback controls. We demonstrate the framework's capabilities through comprehensive waveform analysis, spectrogram comparisons, and feature space evaluation using Principal Component Analysis (PCA). Quantitative evaluation shows mel spectrogram correlation scores exceeding 0.92, indicating high-fidelity preservation of perceptual acoustic structure. SoundPlot is released under the MIT License to facilitate research in bioacoustics, audio signal processing, and computational ethology.
Related papers
- AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis [4.751910547396398]
Accurately modeling sound propagation with complex real-world environments is essential for Novel View Acoustic Synthesis (NVAS)<n>We propose a surface-enhanced geometry-aware approach for NVAS to improve spatial acoustic modeling.<n>We introduce a dual cross-attention-based transformer integrating geometrical constraints into frequency query to understand the surroundings of the emitter.
arXiv Detail & Related papers (2025-03-17T04:22:53Z) - Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering [0.0]
We construct a dataset containing audio samples from music covers on YouTube along with the audio of the original song, and sentiment scores derived from user comments.
Our approach involves extensive pre-processing, segmenting audio signals into 30-second windows, and extracting high-dimensional feature representations.
We train regression models to predict sentiment scores on a 0-100 scale, achieving root mean square error (RMSE) values of 3.420, 5.482, 2.783, and 4.212, respectively.
arXiv Detail & Related papers (2024-10-31T20:26:26Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.<n>Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.<n>We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.<n>Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - Improving Audio-Visual Segmentation with Bidirectional Generation [40.78395709407226]
We introduce a bidirectional generation framework for audio-visual segmentation.
This framework establishes robust correlations between an object's visual characteristics and its associated sound.
We also introduce an implicit volumetric motion estimation module to handle temporal dynamics.
arXiv Detail & Related papers (2023-08-16T11:20:23Z) - Visually-Guided Sound Source Separation with Audio-Visual Predictive
Coding [57.08832099075793]
Visually-guided sound source separation consists of three parts: visual feature extraction, multimodal feature fusion, and sound signal processing.
This paper presents audio-visual predictive coding (AVPC) to tackle this task in parameter harmonizing and more effective manner.
In addition, we develop a valid self-supervised learning strategy for AVPC via co-predicting two audio-visual representations of the same sound source.
arXiv Detail & Related papers (2023-06-19T03:10:57Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - Visual Scene Graphs for Audio Source Separation [65.47212419514761]
State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments.
We propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs.
Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds.
arXiv Detail & Related papers (2021-09-24T13:40:51Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.