Post-Processing Temporal Action Detection
- URL: http://arxiv.org/abs/2211.14924v1
- Date: Sun, 27 Nov 2022 19:50:37 GMT
- Title: Post-Processing Temporal Action Detection
- Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song and Tao Xiang
- Abstract summary: Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence.
This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution.
We introduce a novel model-agnostic post-processing method without model redesign and retraining.
- Score: 134.26292288193298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing Temporal Action Detection (TAD) methods typically take a
pre-processing step in converting an input varying-length video into a
fixed-length snippet representation sequence, before temporal boundary
estimation and action classification. This pre-processing step would temporally
downsample the video, reducing the inference resolution and hampering the
detection performance in the original temporal resolution. In essence, this is
due to a temporal quantization error introduced during the resolution
downsampling and recovery. This could negatively impact the TAD performance,
but is largely ignored by existing methods. To address this problem, in this
work we introduce a novel model-agnostic post-processing method without model
redesign and retraining. Specifically, we model the start and end points of
action instances with a Gaussian distribution for enabling temporal boundary
inference at a sub-snippet level. We further introduce an efficient
Taylor-expansion based approximation, dubbed as Gaussian Approximated
Post-processing (GAP). Extensive experiments demonstrate that our GAP can
consistently improve a wide variety of pre-trained off-the-shelf TAD models on
the challenging ActivityNet (+0.2% -0.7% in average mAP) and THUMOS (+0.2%
-0.5% in average mAP) benchmarks. Such performance gains are already
significant and highly comparable to those achieved by novel model designs.
Also, GAP can be integrated with model training for further performance gain.
Importantly, GAP enables lower temporal resolutions for more efficient
inference, facilitating low-resource applications. The code will be available
in https://github.com/sauradip/GAP
Related papers
- Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Minimizing Energy Costs in Deep Learning Model Training: The Gaussian Sampling Approach [11.878350833222711]
We propose a method called em GradSamp for sampling gradient updates from a Gaussian distribution.
em GradSamp not only streamlines gradient but also enables skipping entire epochs, thereby enhancing overall efficiency.
We rigorously validate our hypothesis across a diverse set of standard and non-standard CNN and transformer-based models.
arXiv Detail & Related papers (2024-06-11T15:01:20Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Reducing the Amortization Gap in Variational Autoencoders: A Bayesian
Random Function Approach [38.45568741734893]
Inference in our GP model is done by a single feed forward pass through the network, significantly faster than semi-amortized methods.
We show that our approach attains higher test data likelihood than the state-of-the-arts on several benchmark datasets.
arXiv Detail & Related papers (2021-02-05T13:01:12Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.