A Novel mapping for visual to auditory sensory substitution
- URL: http://arxiv.org/abs/2106.07448v1
- Date: Mon, 14 Jun 2021 14:14:50 GMT
- Title: A Novel mapping for visual to auditory sensory substitution
- Authors: Ezsan Mehrbani, Sezedeh Fatemeh Mirhoseini, Noushin Riahi
- Abstract summary: visual information can be converted into audio stream via sensory substitution devices.
Results in blind object recognition for real objects was achieved 88.05 on average.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: visual information can be converted into audio stream via sensory
substitution devices in order to give visually impaired people the chance of
perception of their surrounding easily and simultaneous to performing everyday
tasks. In this study, visual environmental features namely, coordinate, type of
objects and their size are assigned to audio features related to music tones
such as frequency, time duration and note permutations. Results demonstrated
that this new method has more training time efficiency in comparison with our
previous method named VBTones which sinusoidal tones were applied. Moreover,
results in blind object recognition for real objects was achieved 88.05 on
average.
Related papers
- The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification [42.14708549155406]
We show that models trained with perceptual metrics as loss functions can capture perceptually meaningful features.
We demonstrate that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks.
arXiv Detail & Related papers (2024-09-25T16:29:21Z) - Egocentric Audio-Visual Object Localization [51.434212424829525]
We propose a geometry-aware temporal aggregation module to handle the egomotion explicitly.
The effect of egomotion is mitigated by estimating the temporal geometry transformation and exploiting it to update visual representations.
It improves cross-modal localization robustness by disentangling visually-indicated audio representation.
arXiv Detail & Related papers (2023-03-23T17:43:11Z) - EEG2Mel: Reconstructing Sound from Brain Responses to Music [0.0]
We improve on previous methods by reconstructing music stimuli well enough to be perceived and identified independently.
Deep learning models were trained on time-aligned music stimuli spectrum for each corresponding one-second window of EEG recording.
Reconstructions of auditory music stimuli were discriminated by listeners at an 85% success rate (50% chance) in a two-alternative match-to-sample task.
arXiv Detail & Related papers (2022-07-28T01:06:51Z) - Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural
Sounds [118.54908665440826]
Humans can robustly recognize and localize objects by using visual and/or auditory cues.
This work develops an approach for scene understanding purely based on sounds.
The co-existence of visual and audio cues is leveraged for supervision transfer.
arXiv Detail & Related papers (2021-09-06T22:24:00Z) - Learning Audio-Visual Dereverberation [87.52880019747435]
Reverberation from audio reflecting off surfaces and objects in the environment not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition.
Our idea is to learn to dereverberate speech from audio-visual observations.
We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed sounds and visual scene.
arXiv Detail & Related papers (2021-06-14T20:01:24Z) - Audiovisual transfer learning for audio tagging and sound event
detection [21.574781022415372]
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection.
We adapt a baseline system utilizing only spectral acoustic inputs to make use of pretrained auditory and visual features.
We perform experiments with these modified models on an audiovisual multi-label data set.
arXiv Detail & Related papers (2021-06-09T21:55:05Z) - Deep Sensory Substitution: Noninvasively Enabling Biological Neural
Networks to Receive Input from Artificial Neural Networks [5.478764356647437]
This work describes a novel technique for leveraging machine-learned feature embeddings to sonify visual information into a perceptual audio domain.
A generative adversarial network (GAN) is then used to find a distance preserving map from this metric space of feature vectors into the metric space defined by a target audio dataset.
In human subject tests, users were able to accurately classify audio sonifications of faces.
arXiv Detail & Related papers (2020-05-27T11:41:48Z) - Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition [61.54648991466747]
We explore an audiovisual aerial scene recognition task using both images and sounds as input.
We show the benefit of exploiting the audio information for the aerial scene recognition.
arXiv Detail & Related papers (2020-05-18T04:14:16Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z) - Deep Audio-Visual Learning: A Survey [53.487938108404244]
We divide the current audio-visual learning tasks into four different subfields.
We discuss state-of-the-art methods as well as the remaining challenges of each subfield.
We summarize the commonly used datasets and performance metrics.
arXiv Detail & Related papers (2020-01-14T13:11:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.