Locate This, Not That: Class-Conditioned Sound Event DOA Estimation
- URL: http://arxiv.org/abs/2203.04197v1
- Date: Tue, 8 Mar 2022 16:49:15 GMT
- Title: Locate This, Not That: Class-Conditioned Sound Event DOA Estimation
- Authors: Olga Slizovskaia, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux
- Abstract summary: We propose an alternative class-conditioned SELD model for situations where we may not be interested in all classes all of the time.
This class-conditioned SELD model takes as input the spatial and spectral features from the sound file, and also a one-hot vector indicating the class we are currently interested in localizing.
- Score: 50.74947937253836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing systems for sound event localization and detection (SELD) typically
operate by estimating a source location for all classes at every time instant.
In this paper, we propose an alternative class-conditioned SELD model for
situations where we may not be interested in localizing all classes all of the
time. This class-conditioned SELD model takes as input the spatial and spectral
features from the sound file, and also a one-hot vector indicating the class we
are currently interested in localizing. We inject the conditioning information
at several points in our model using feature-wise linear modulation (FiLM)
layers. Through experiments on the DCASE 2020 Task 3 dataset, we show that the
proposed class-conditioned SELD model performs better in terms of common SELD
metrics than the baseline model that locates all classes simultaneously, and
also outperforms specialist models that are trained to locate only a single
class of interest. We also evaluate performance on the DCASE 2021 Task 3
dataset, which includes directional interference (sound events from classes we
are not interested in localizing) and notice especially strong improvement from
the class-conditioned model.
Related papers
- LETS-C: Leveraging Language Embedding for Time Series Classification [15.520883566827608]
We propose an alternative approach to leveraging the success of language modeling in the time series domain.
We utilize a language embedding model to embed time series and then pair the embeddings with a simple classification head composed of convolutional neural networks (CNN) and multilayer perceptron (MLP)
Our findings suggest that leveraging language encoders to embed time series data, combined with a simple yet effective classification head, offers a promising direction for achieving high-performance time series classification.
arXiv Detail & Related papers (2024-07-09T04:07:57Z) - Universal Semi-supervised Model Adaptation via Collaborative Consistency
Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA)
We propose a collaborative consistency training framework that regularizes the prediction consistency between two models.
Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Learning to Discover and Detect Objects [43.52208526783969]
We tackle the problem of novel class discovery, detection, and localization (NCDL)
In this setting, we assume a source dataset with labels for objects of commonly observed classes.
By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes.
arXiv Detail & Related papers (2022-10-19T17:59:55Z) - Current Trends in Deep Learning for Earth Observation: An Open-source
Benchmark Arena for Image Classification [7.511257876007757]
'AiTLAS: Benchmark Arena' is an open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification.
We present a comprehensive comparative analysis of more than 400 models derived from nine different state-of-the-art architectures.
arXiv Detail & Related papers (2022-07-14T20:18:58Z) - A Gating Model for Bias Calibration in Generalized Zero-shot Learning [18.32369721322249]
Generalized zero-shot learning (GZSL) aims at training a model that can generalize to unseen class data by only using auxiliary information.
One of the main challenges in GZSL is a biased model prediction toward seen classes caused by overfitting on only available seen class data during training.
We propose a two-stream autoencoder-based gating model for GZSL.
arXiv Detail & Related papers (2022-03-08T16:41:06Z) - Unsupervised Domain Adaptation for Spatio-Temporal Action Localization [69.12982544509427]
S-temporal action localization is an important problem in computer vision.
We propose an end-to-end unsupervised domain adaptation algorithm.
We show that significant performance gain can be achieved when spatial and temporal features are adapted separately or jointly.
arXiv Detail & Related papers (2020-10-19T04:25:10Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - Multi-label learning for dynamic model type recommendation [13.304462985219237]
We propose a problem-independent dynamic base-classifier model recommendation for the online local pool (OLP) technique.
Our proposed framework builds a multi-label meta-classifier responsible for recommending a set of relevant model types.
Experimental results show that different data distributions favored different model types on a local scope.
arXiv Detail & Related papers (2020-04-01T16:42:12Z) - Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition
from a Domain Adaptation Perspective [98.70226503904402]
Object frequency in the real world often follows a power law, leading to a mismatch between datasets with long-tailed class distributions.
We propose to augment the classic class-balanced learning by explicitly estimating the differences between the class-conditioned distributions with a meta-learning approach.
arXiv Detail & Related papers (2020-03-24T11:28:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.