DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
- URL: http://arxiv.org/abs/2303.14863v2
- Date: Fri, 14 Jul 2023 12:06:13 GMT
- Title: DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
- Authors: Sauradip Nag, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song and Tao Xiang
- Abstract summary: We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD.
Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video.
- Score: 137.8749239614528
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new formulation of temporal action detection (TAD) with
denoising diffusion, DiffTAD in short. Taking as input random temporal
proposals, it can yield action proposals accurately given an untrimmed long
video. This presents a generative modeling perspective, against previous
discriminative learning manners. This capability is achieved by first diffusing
the ground-truth proposals to random ones (i.e., the forward/noising process)
and then learning to reverse the noising process (i.e., the backward/denoising
process). Concretely, we establish the denoising process in the Transformer
decoder (e.g., DETR) by introducing a temporal location query design with
faster convergence in training. We further propose a cross-step selective
conditioning algorithm for inference acceleration. Extensive evaluations on
ActivityNet and THUMOS show that our DiffTAD achieves top performance compared
to previous art alternatives. The code will be made available at
https://github.com/sauradip/DiffusionTAD.
Related papers
- Diffusion Models With Learned Adaptive Noise [12.530583016267768]
In this paper, we explore whether the diffusion process can be learned from data.
A widely held assumption is that the ELBO is invariant to the noise process.
We propose MULAN, a learned diffusion process that applies noise at different rates across an image.
arXiv Detail & Related papers (2023-12-20T18:00:16Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer [3.2634122554914002]
Diffusion based vocoders have been criticised for being slow due to the many steps required during sampling.
We propose a setup where the targets are the different outputs of forward process time steps.
We show through extensive evaluation that the proposed technique generates high-fidelity speech in competitive time.
arXiv Detail & Related papers (2023-09-18T10:35:27Z) - DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and
Highlight Detection [38.12212015133935]
A novel framework, DiffusionVMR, is proposed to redefine the two tasks as a unified conditional denoising generation process.
Experiments conducted on five widely-used benchmarks demonstrate the effectiveness and flexibility of the proposed DiffusionVMR.
arXiv Detail & Related papers (2023-08-29T08:20:23Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models [103.41269503488546]
Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models with user-provided concepts.
This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents.
We propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs.
It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters.
arXiv Detail & Related papers (2023-07-20T09:06:21Z) - Boundary-Denoising for Video Activity Localization [57.9973253014712]
We study the video activity localization problem from a denoising perspective.
Specifically, we propose an encoder-decoder model named DenoiseLoc.
Experiments show that DenoiseLoc advances %in several video activity understanding tasks.
arXiv Detail & Related papers (2023-04-06T08:48:01Z) - Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z) - Speech Enhancement and Dereverberation with Diffusion-based Generative
Models [14.734454356396157]
We present a detailed overview of the diffusion process that is based on a differential equation.
We show that this procedure enables using only 30 diffusion steps to generate high-quality clean speech estimates.
In an extensive cross-dataset evaluation, we show that the improved method can compete with recent discriminative models.
arXiv Detail & Related papers (2022-08-11T13:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.