SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context
- URL: http://arxiv.org/abs/2009.05188v1
- Date: Fri, 11 Sep 2020 01:19:12 GMT
- Title: SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context
- Authors: Mark Cartwright, Jason Cramer, Ana Elisa Mendez Mendez, Yu Wang,
Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie
Mydlarz, Justin Salamon, Oded Nov, and Juan Pablo Bello
- Abstract summary: We present a dataset for urban sound tagging withtemporal information.
This dataset provides the opportunity to investigate how metadata can aid in the prediction of the urban sound tags.
- Score: 32.84541094143274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present SONYC-UST-V2, a dataset for urban sound tagging with
spatiotemporal information. This dataset is aimed for the development and
evaluation of machine listening systems for real-world urban noise monitoring.
While datasets of urban recordings are available, this dataset provides the
opportunity to investigate how spatiotemporal metadata can aid in the
prediction of urban sound tags. SONYC-UST-V2 consists of 18510 audio recordings
from the "Sounds of New York City" (SONYC) acoustic sensor network, including
the timestamp of audio acquisition and location of the sensor. The dataset
contains annotations by volunteers from the Zooniverse citizen science
platform, as well as a two-stage verification with our team. In this article,
we describe our data collection procedure and propose evaluation metrics for
multilabel classification of urban sound tags. We report the results of a
simple baseline model that exploits spatiotemporal information.
Related papers
- Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data.
To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization.
To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z) - PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping [7.076417856575795]
A soundscape is defined by the acoustic environment a person perceives at a location.
We propose a framework for mapping soundscapes across the Earth.
We represent locations with multi-scale satellite imagery and learn a joint representation among this imagery, audio, and text.
arXiv Detail & Related papers (2024-08-13T17:37:40Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - VALERIE22 -- A photorealistic, richly metadata annotated dataset of
urban environments [5.439020425819001]
The VALERIE tool pipeline is a synthetic data generator developed to contribute to the understanding of domain-specific factors.
The VALERIE22 dataset was generated with the VALERIE procedural tools pipeline providing a photorealistic sensor simulation.
The dataset provides a uniquely rich set of metadata, allowing extraction of specific scene and semantic features.
arXiv Detail & Related papers (2023-08-18T15:44:45Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - Urban Rhapsody: Large-scale exploration of urban soundscapes [12.997538969557649]
Noise is one of the primary quality-of-life issues in urban environments.
Low-cost sensors can be deployed to monitor ambient noise levels at high temporal resolutions.
The amount of data they produce and the complexity of these data pose significant analytical challenges.
We propose Urban Rhapsody, a framework that combines state-of-the-art audio representation, machine learning, and visual analytics.
arXiv Detail & Related papers (2022-05-25T22:02:36Z) - Urban Space Insights Extraction using Acoustic Histogram Information [13.808053718325628]
We study the implementation of low-cost analogue sound sensors to detect outdoor activities and estimate the raining period in an urban residential area.
The analogue sound sensors are transmitted to the cloud every 5 minutes in histogram format, which consists of sound data sampled every 100ms (10Hz)
arXiv Detail & Related papers (2020-12-10T07:21:34Z) - Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions [64.43064637421007]
We introduce a novel task of audiovisual crowd counting, in which visual and auditory information are integrated for counting purposes.
We collect a large-scale benchmark, named auDiovISual Crowd cOunting dataset.
We make use of a linear feature-wise fusion module that carries out an affine transformation on visual and auditory features.
arXiv Detail & Related papers (2020-05-14T16:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.