AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
- URL: http://arxiv.org/abs/2308.15726v1
- Date: Wed, 30 Aug 2023 03:03:47 GMT
- Title: AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
- Authors: Nan Che and Chenrui Liu and Fei Yu
- Abstract summary: This paper proposes a data set (called as AGS) for the home environment sound.
This data set considers various types of overlapping audio in the scene, background noise.
- Score: 1.5106201893222209
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Environmental sound scene and sound event recognition is important for the
recognition of suspicious events in indoor and outdoor environments (such as
nurseries, smart homes, nursing homes, etc.) and is a fundamental task involved
in many audio surveillance applications. In particular, there is no public
common data set for the research field of sound event recognition for the data
set of the indoor environmental sound scene. Therefore, this paper proposes a
data set (called as AGS) for the home environment sound. This data set
considers various types of overlapping audio in the scene, background noise.
Moreover, based on the proposed data set, this paper compares and analyzes the
advanced methods for sound event recognition, and then illustrates the
reliability of the data set proposed in this paper, and studies the challenges
raised by the new data set. Our proposed AGS and the source code of the
corresponding baselines at https://github.com/taolunzu11/AGS .
Related papers
- The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection [15.488319837656702]
This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults.
The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period.
A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice.
arXiv Detail & Related papers (2024-09-17T15:10:36Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.
Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.
We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.
Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling [57.1025908604556]
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment.
We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment.
We introduce ActiveRIR, a reinforcement learning policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions.
arXiv Detail & Related papers (2024-04-24T21:30:01Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - A benchmark of state-of-the-art sound event detection systems evaluated
on synthetic soundscapes [10.512055210540668]
We study the solutions proposed by participants to analyze their robustness to varying level target to non-target signal-to-noise ratio and to temporal localization of target sound events.
Results show that systems tend to spuriously predict short events when non-target events are present.
arXiv Detail & Related papers (2022-02-03T09:41:31Z) - Proposal-based Few-shot Sound Event Detection for Speech and
Environmental Sounds with Perceivers [0.7776497736451751]
We propose a region proposal-based approach to few-shot sound event detection utilizing the Perceiver architecture.
Motivated by a lack of suitable benchmark datasets, we generate two new few-shot sound event localization datasets.
arXiv Detail & Related papers (2021-07-28T19:46:55Z) - DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in
Dementia Patients Environment [0.0]
We generate an unbiased synthetic domestic audio database, consisting of sound scenes and events, emulated in both quiet and noisy environments.
Data is carefully curated such that it reflects issues commonly faced in a dementia patients environment.
We present an 11-class database containing excerpts of clean and noisy signals at 5-seconds duration each, uniformly sampled at 16 kHz.
arXiv Detail & Related papers (2021-04-27T18:51:44Z) - Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.