Conditioned Time-Dilated Convolutions for Sound Event Detection
- URL: http://arxiv.org/abs/2007.05183v1
- Date: Fri, 10 Jul 2020 06:05:23 GMT
- Title: Conditioned Time-Dilated Convolutions for Sound Event Detection
- Authors: Konstantinos Drossos and Stylianos I. Mimilakis and Tuomas Virtanen
- Abstract summary: We present a novel algorithm for the conditioning of the time-dilated convolutions which functions similarly to language modelling.
We employ the freely available TUT-SED Synthetic dataset, and we assess the performance of our method using the average per-frame $textF_1$ score and average per-frame error rate.
- Score: 20.883760606514937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sound event detection (SED) is the task of identifying sound events along
with their onset and offset times. A recent, convolutional neural networks
based SED method, proposed the usage of depthwise separable (DWS) and
time-dilated convolutions. DWS and time-dilated convolutions yielded
state-of-the-art results for SED, with considerable small amount of parameters.
In this work we propose the expansion of the time-dilated convolutions, by
conditioning them with jointly learned embeddings of the SED predictions by the
SED classifier. We present a novel algorithm for the conditioning of the
time-dilated convolutions which functions similarly to language modelling, and
enhances the performance of the these convolutions. We employ the freely
available TUT-SED Synthetic dataset, and we assess the performance of our
method using the average per-frame $\text{F}_{1}$ score and average per-frame
error rate, over the 10 experiments. We achieve an increase of 2\% (from 0.63
to 0.65) at the average $\text{F}_{1}$ score (the higher the better) and a
decrease of 3\% (from 0.50 to 0.47) at the error rate (the lower the better).
Related papers
- Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models [57.474294329887236]
Diffusion large language models (dLLMs) generate text through iterative denoising.<n>Current decoding strategies discard rich intermediate predictions in favor of the final output.<n>We introduce two complementary methods that exploit temporal consistency.
arXiv Detail & Related papers (2025-08-12T17:59:57Z) - Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models [13.312007032203857]
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling.<n>We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain.<n>By reusing information from previous generations, we get an anytime algorithm that turns additional compute into steadily better samples.
arXiv Detail & Related papers (2025-06-25T17:59:10Z) - Noise Conditional Variational Score Distillation [60.38982038894823]
Noise Conditional Variational Score Distillation (NCVSD) is a novel method for distilling pretrained diffusion models into generative denoisers.<n>By integrating this insight into the Variational Score Distillation framework, we enable scalable learning of generative denoisers.
arXiv Detail & Related papers (2025-06-11T06:01:39Z) - Test-Time Scaling of Diffusion Models via Noise Trajectory Search [7.243632426715941]
We introduce an $epsilon$-greedy search algorithm that globally explores at extreme timesteps and locally exploits during the intermediate steps where de-mixing occurs.<n>Experiments on EDM and Stable Diffusion reveal state-of-the-art scores for class-conditioned/text-to-image generation.
arXiv Detail & Related papers (2025-05-24T19:13:29Z) - MR-EEGWaveNet: Multiresolutional EEGWaveNet for Seizure Detection from Long EEG Recordings [7.9595266728435545]
We propose a novel end-to-end model, ''Multiresolutional EEGWaveNet (MR-EEGWaveNet),'' which efficiently distinguishes seizure events from background electroencephalogram (EEG) artifacts/noise.<n>The model has three modules: convolution, feature extraction, and predictor.<n>The proposed MR-EEGWaveNet significantly outperformed the conventional non-multiresolution approach.
arXiv Detail & Related papers (2025-05-23T14:40:50Z) - Noisy Test-Time Adaptation in Vision-Language Models [73.14136220844156]
Test-time adaptation (TTA) aims to address distribution shifts between source and target data by relying solely on target data during testing.
This paper introduces Zero-Shot Noisy TTA (ZS-NTTA), focusing on adapting the model to target data with noisy samples during test-time in a zero-shot manner.
We introduce the Adaptive Noise Detector (AdaND), which utilizes the frozen model's outputs as pseudo-labels to train a noise detector.
arXiv Detail & Related papers (2025-02-20T14:37:53Z) - ProtoSeg: A Prototype-Based Point Cloud Instance Segmentation Method [6.632158868486343]
This paper presents a novel neural network architecture for performing instance segmentation on 3D point clouds.
We propose to jointly learn coefficients and prototypes in parallel which can be combined to obtain the instance predictions.
The proposed method is not only 28% faster than the state-of-the-art, it also exhibits the lowest standard deviation.
arXiv Detail & Related papers (2024-10-03T10:05:27Z) - Few-shot Learning using Data Augmentation and Time-Frequency
Transformation for Time Series Classification [6.830148185797109]
We propose a novel few-shot learning framework through data augmentation.
We also develop a sequence-spectrogram neural network (SSNN)
Our methodology demonstrates its applicability of addressing the few-shot problems for time series classification.
arXiv Detail & Related papers (2023-11-06T15:32:50Z) - Enhancing Cross-Dataset Performance of Distracted Driving Detection With Score Softmax Classifier And Dynamic Gaussian Smoothing Supervision [6.891556476231427]
Deep neural networks enable real-time monitoring of in-vehicle drivers, facilitating the timely prediction of distractions, fatigue, and potential hazards.
Recent research has exposed unreliable cross-dataset driver behavior recognition due to a limited number of data samples and background noise.
We propose a Score-Softmax classifier, which reduces the model overconfidence by enhancing category independence.
arXiv Detail & Related papers (2023-10-08T15:28:01Z) - Single and Few-step Diffusion for Generative Speech Enhancement [18.487296462927034]
Diffusion models have shown promising results in speech enhancement.
In this paper, we address these limitations through a two-stage training approach.
We show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting.
arXiv Detail & Related papers (2023-09-18T11:30:58Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - Asynchronous Training Schemes in Distributed Learning with Time Delay [17.259708772713164]
In the context of distributed deep learning, the issue of stale weights or gradients could result in poor algorithmic performance.
In this paper, we propose a different approach to tackle the issue of stale weights or gradients.
One practical variant of PC-ASGD is also proposed by adopting a condition to help with the determination of the tradeoff parameter.
arXiv Detail & Related papers (2022-08-28T07:14:59Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Displacement-Invariant Cost Computation for Efficient Stereo Matching [122.94051630000934]
Deep learning methods have dominated stereo matching leaderboards by yielding unprecedented disparity accuracy.
But their inference time is typically slow, on the order of seconds for a pair of 540p images.
We propose a emphdisplacement-invariant cost module to compute the matching costs without needing a 4D feature volume.
arXiv Detail & Related papers (2020-12-01T23:58:16Z) - Score-Based Generative Modeling through Stochastic Differential
Equations [114.39209003111723]
We present a differential equation that transforms a complex data distribution to a known prior distribution by injecting noise.
A corresponding reverse-time SDE transforms the prior distribution back into the data distribution by slowly removing the noise.
By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks.
We demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.
arXiv Detail & Related papers (2020-11-26T19:39:10Z) - ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates.
We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z) - Sound Event Detection with Depthwise Separable and Dilated Convolutions [23.104644393058123]
State-of-the-art sound event detection (SED) methods usually employ a series of convolutional neural networks (CNNs) to extract useful features from the input audio signal.
We propose the replacement of the CNNs with depthwise separable convolutions and the replacement of the RNNs with dilated convolutions.
We achieve a reduction of the amount of parameters by 85% and average training time per epoch by 78%, and an increase the average frame-wise F1 score and reduction of the average error rate by 4.6% and 3.8%, respectively.
arXiv Detail & Related papers (2020-02-02T19:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.