Related papers: Using growth transform dynamical systems for spatio-temporal data sonification

Using growth transform dynamical systems for spatio-temporal data sonification

URL: http://arxiv.org/abs/2108.09537v1
Date: Sat, 21 Aug 2021 16:25:59 GMT
Title: Using growth transform dynamical systems for spatio-temporal data sonification
Authors: Oindrila Chatterjee, Shantanu Chakrabartty
Abstract summary: Sonification, or encoding information in meaningful audio signatures, has several advantages in augmenting or replacing traditional visualization methods for human-in-the-loop decision-making. This paper presents a novel framework for sonifying high-dimensional data using a complex growth transform dynamical system model. Our algorithm takes as input the data and optimization parameters underlying the learning or prediction task and combines it with the psycho parameters defined by the user.
Score: 9.721342507747158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sonification, or encoding information in meaningful audio signatures, has several advantages in augmenting or replacing traditional visualization methods for human-in-the-loop decision-making. Standard sonification methods reported in the literature involve either (i) using only a subset of the variables, or (ii) first solving a learning task on the data and then mapping the output to an audio waveform, which is utilized by the end-user to make a decision. This paper presents a novel framework for sonifying high-dimensional data using a complex growth transform dynamical system model where both the learning (or, more generally, optimization) and the sonification processes are integrated together. Our algorithm takes as input the data and optimization parameters underlying the learning or prediction task and combines it with the psychoacoustic parameters defined by the user. As a result, the proposed framework outputs binaural audio signatures that not only encode some statistical properties of the high-dimensional data but also reveal the underlying complexity of the optimization/learning process. Along with extensive experiments using synthetic datasets, we demonstrate the framework on sonifying Electro-encephalogram (EEG) data with the potential for detecting epileptic seizures in pediatric patients.

Related papers

ETTA: Elucidating the Design Space of Text-to-Audio Models [33.831803213869605]
We study the effects of data, model architecture, training objective functions, and sampling strategies on target benchmarks. We propose our best model dubbed Elucidated Text-To-Audio (ETTA) ETTA provides improvements over the baselines trained on publicly available data, while being competitive with models trained on proprietary data.
arXiv Detail & Related papers (2024-12-26T21:13:12Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization. To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z)
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? [12.662031101992968]
We investigate the effects of multiple modalities on recognition accuracy on both synthetic and real-world datasets. Images as a supplementary modality for speech recognition provide the greatest benefit at moderate noise levels. Performance improves on both synthetic and real-world datasets when the most relevant visual information is filtered as a preprocessing step.
arXiv Detail & Related papers (2024-09-13T22:18:45Z)
Contrastive Learning from Synthetic Audio Doppelgangers [1.3754952818114714]
We propose a solution to both the data scale and transformation limitations, leveraging synthetic audio. By randomly perturbing the parameters of a sound synthesizer, we generate audio doppelg"angers-synthetic positive pairs with causally manipulated variations in timbre, pitch, and temporal envelopes. Despite the shift to randomly generated synthetic data, our method produces strong representations, competitive with real data on standard audio classification benchmarks.
arXiv Detail & Related papers (2024-06-09T21:44:06Z)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities. RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z)
Learning Latent Dynamics via Invariant Decomposition and (Spatio-)Temporal Transformers [0.6767885381740952]
We propose a method for learning dynamical systems from high-dimensional empirical data. We focus on the setting in which data are available from multiple different instances of a system. We study behaviour through simple theoretical analyses and extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-21T07:52:07Z)
Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z)
Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation [72.7915031238824]
Large diffusion models have been successful in text-to-audio (T2A) synthesis tasks. They often suffer from common issues such as semantic misalignment and poor temporal consistency. We propose Make-an-Audio 2, a latent diffusion-based T2A method that builds on the success of Make-an-Audio.
arXiv Detail & Related papers (2023-05-29T10:41:28Z)
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem. Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols. By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z)
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition [18.924716098922683]
Machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions. We propose two novel techniques during training to mitigate the problems due to the distribution gap. We show that these methods significantly improve the training of speech recognition models using synthetic data.
arXiv Detail & Related papers (2021-10-21T21:11:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.