Related papers: Audio-Based Pedestrian Detection in the Presence of Vehicular Noise

Audio-Based Pedestrian Detection in the Presence of Vehicular Noise

URL: http://arxiv.org/abs/2509.19295v1
Date: Tue, 23 Sep 2025 17:57:44 GMT
Title: Audio-Based Pedestrian Detection in the Presence of Vehicular Noise
Authors: Yonghyun Kim, Chaeyeon Han, Akash Sarode, Noah Posner, Subhrajit Guhathakurta, Alexander Lerch,
Abstract summary: We present a new dataset, results, and a detailed analysis of the state-of-the-art in audio-based pedestrian detection in the presence of vehicular noise.<n>In our study, we conduct three analyses: (i) cross-dataset evaluation between noisy and noise-limited environments, (ii) an assessment of the impact of noisy data on model performance, and (iii) an evaluation of the model's predictive robustness on out-of-domain sounds.
Score: 39.631104350049945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Audio-based pedestrian detection is a challenging task and has, thus far, only been explored in noise-limited environments. We present a new dataset, results, and a detailed analysis of the state-of-the-art in audio-based pedestrian detection in the presence of vehicular noise. In our study, we conduct three analyses: (i) cross-dataset evaluation between noisy and noise-limited environments, (ii) an assessment of the impact of noisy data on model performance, highlighting the influence of acoustic context, and (iii) an evaluation of the model's predictive robustness on out-of-domain sounds. The new dataset is a comprehensive 1321-hour roadside dataset. It incorporates traffic-rich soundscapes. Each recording includes 16kHz audio synchronized with frame-level pedestrian annotations and 1fps video thumbnails.

Related papers

Evaluation of Deep Audio Representations for Hearables [1.5646349560044959]
This dataset includes 1,158 audio tracks, each 30 seconds long, created by spatially mixing proprietary monologues with high-quality recordings of everyday acoustic scenes.<n>Our benchmark encompasses eight tasks that assess the general context, speech sources, and technical acoustic properties of the audio scenes.<n>This superiority underscores the advantage of models trained on diverse audio collections, confirming their applicability to a wide array of auditory tasks, including encoding the environment properties necessary for hearable steering.
arXiv Detail & Related papers (2025-02-10T16:51:11Z)
Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models. We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z)
NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification [7.464154519547575]
Existing research on learning with noisy labels predominantly focuses on synthetic noise patterns. We constructed a benchmark dataset to better understand label noise in real-world text classification settings. Our findings reveal that while pre-trained models are resilient to synthetic noise, they struggle against instance-dependent noise.
arXiv Detail & Related papers (2024-07-09T06:18:40Z)
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities. RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z)
Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Enhancement of 3D Camera Synthetic Training Data with Noise Models [1.6385815610837167]
The goal of this paper is to assess the impact of noise in 3D camera-captured data by modeling the noise of the imaging process. We compiled a dataset of specifically constructed scenes to obtain a noise model. We specifically model lateral noise, affecting the position of captured points in the image plane, and axial noise, affecting the position along the axis to the image plane.
arXiv Detail & Related papers (2024-02-26T11:50:42Z)
Sample selection with noise rate estimation in noise learning of medical image analysis [3.9934250802854376]
This paper introduces a new sample selection method that enhances the performance of neural networks when trained on noisy datasets. Our approach features estimating the noise rate of a dataset by analyzing the distribution of loss values using Linear Regression. We employ sparse regularization to further enhance the noise robustness of our model.
arXiv Detail & Related papers (2023-12-23T11:57:21Z)
AdVerb: Visually Guided Audio Dereverberation [49.958724234969445]
We present AdVerb, a novel audio-visual dereverberation framework. It uses visual cues in addition to the reverberant sound to estimate clean audio.
arXiv Detail & Related papers (2023-08-23T18:20:59Z)
Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment. We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio. Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z)
A Study on Robustness to Perturbations for Representations of Environmental Sound [16.361059909912758]
We evaluate two embeddings -- YAMNet, and OpenL$3$ on monophonic (UrbanSound8K) and polyphonic (SONYC UST) datasets. We imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new embeddings with three distance measures.
arXiv Detail & Related papers (2022-03-20T01:04:38Z)
Adaptive noise imitation for image denoising [58.21456707617451]
We develop a new textbfadaptive noise imitation (ADANI) algorithm that can synthesize noisy data from naturally noisy images. To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation. Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner.
arXiv Detail & Related papers (2020-11-30T02:49:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.