EnvId: A Metric Learning Approach for Forensic Few-Shot Identification of Unseen Environments
- URL: http://arxiv.org/abs/2405.02119v2
- Date: Tue, 11 Feb 2025 21:40:27 GMT
- Title: EnvId: A Metric Learning Approach for Forensic Few-Shot Identification of Unseen Environments
- Authors: Denise Moussa, Germans Hirsch, Christian Riess,
- Abstract summary: We propose a representation learning framework called EnvId, short for environment identification.
EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem.
It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.
- Score: 6.570712059945705
- License:
- Abstract: Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of a recorded audio to its recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide supervised classification tools for closed-set recording environment identification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, supervised learning techniques are not applicable without retraining a classifier on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining by modeling the task as a few-shot classification problem. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, out-of-distribution reverberation characteristics or recording position mismatches.
Related papers
- Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - Learning to Detect Novel and Fine-Grained Acoustic Sequences Using
Pretrained Audio Representations [17.043435238200605]
We develop procedures for pretraining suitable representations, and methods which transfer them to our few shot learning scenario.
Our experiments evaluate the general purpose utility of our pretrained representations on AudioSet.
Our pretrained embeddings are suitable to the proposed task, and enable multiple aspects of our few shot framework.
arXiv Detail & Related papers (2023-05-03T18:41:24Z) - Learning to Adapt to Domain Shifts with Few-shot Samples in Anomalous
Sound Detection [7.631596468553607]
Anomaly detection has many important applications, such as monitoring industrial equipment.
We propose a framework that adapts to new conditions with few-shot samples.
We evaluate our proposed method on a recently-released dataset of audio measurements from different machine types.
arXiv Detail & Related papers (2022-04-05T00:22:25Z) - Robust Feature Learning on Long-Duration Sounds for Acoustic Scene
Classification [54.57150493905063]
Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded.
We propose a robust feature learning (RFL) framework to train the CNN.
arXiv Detail & Related papers (2021-08-11T03:33:05Z) - Speech Enhancement for Wake-Up-Word detection in Voice Assistants [60.103753056973815]
Keywords spotting and in particular Wake-Up-Word (WUW) detection is a very important task for voice assistants.
This paper proposes a Speech Enhancement model adapted to the task of WUW detection.
It aims at increasing the recognition rate and reducing the false alarms in the presence of these types of noises.
arXiv Detail & Related papers (2021-01-29T18:44:05Z) - Open-set Short Utterance Forensic Speaker Verification using
Teacher-Student Network with Explicit Inductive Bias [59.788358876316295]
We propose a pipeline solution to improve speaker verification on a small actual forensic field dataset.
By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning.
We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances.
arXiv Detail & Related papers (2020-09-21T00:58:40Z) - Cross-domain Adaptation with Discrepancy Minimization for
Text-independent Forensic Speaker Verification [61.54074498090374]
This study introduces a CRSS-Forensics audio dataset collected in multiple acoustic environments.
We pre-train a CNN-based network using the VoxCeleb data, followed by an approach which fine-tunes part of the high-level network layers with clean speech from CRSS-Forensics.
arXiv Detail & Related papers (2020-09-05T02:54:33Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z) - Sound of Guns: Digital Forensics of Gun Audio Samples meets Artificial
Intelligence [0.7734726150561086]
We introduce a novel technique that requires zero knowledge about the recording setup and is completely agnostic to the relative positions of both the microphone and shooter.
Our solution can identify the category, caliber, and model of the gun, reaching over 90% accuracy on a dataset composed of 3655 samples.
arXiv Detail & Related papers (2020-04-15T09:12:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.