Action Detection via an Image Diffusion Process
- URL: http://arxiv.org/abs/2404.01051v1
- Date: Mon, 1 Apr 2024 11:12:06 GMT
- Title: Action Detection via an Image Diffusion Process
- Authors: Lin Geng Foo, Tianjiao Li, Hossein Rahmani, Jun Liu,
- Abstract summary: Action detection aims to localize the starting and ending points of action instances in untrimmed videos.
We tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images.
Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.
- Score: 19.013962634522485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action detection aims to localize the starting and ending points of action instances in untrimmed videos, and predict the classes of those instances. In this paper, we make the observation that the outputs of the action detection task can be formulated as images. Thus, from a novel perspective, we tackle action detection via a three-image generation process to generate starting point, ending point and action-class predictions as images via our proposed Action Detection Image Diffusion (ADI-Diff) framework. Furthermore, since our images differ from natural images and exhibit special properties, we further explore a Discrete Action-Detection Diffusion Process and a Row-Column Transformer design to better handle their processing. Our ADI-Diff framework achieves state-of-the-art results on two widely-used datasets.
Related papers
- Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection [4.938957922033169]
Out-of-distribution (OOD) detection targets to detect and reject test samples with semantic shifts.
We propose a novel Uncertainty-Guided Appearance-Motion Association Network (UAAN)
We show that UAAN beats state-of-the-art methods by a significant margin, illustrating its effectiveness.
arXiv Detail & Related papers (2024-09-16T02:53:49Z) - Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks [9.388897214344572]
Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision.
Traditionally, parametric techniques have been employed for this task.
Recent advancements have seen a shift towards learning-based methods.
arXiv Detail & Related papers (2024-08-29T11:16:34Z) - Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation [34.11373539564126]
This study focuses on a novel task in text-to-image (T2I) generation, namely action customization.
The objective of this task is to learn the co-existing action from limited data and generalize it to unseen humans or even animals.
arXiv Detail & Related papers (2023-11-27T14:07:13Z) - Object-centric Cross-modal Feature Distillation for Event-based Object
Detection [87.50272918262361]
RGB detectors still outperform event-based detectors due to sparsity of the event data and missing visual details.
We develop a novel knowledge distillation approach to shrink the performance gap between these two modalities.
We show that object-centric distillation allows to significantly improve the performance of the event-based student object detector.
arXiv Detail & Related papers (2023-11-09T16:33:08Z) - Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models [60.422435066544814]
We present a technique to generate a collection of images that depicts variations in the shape of a specific object.
A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape.
To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
arXiv Detail & Related papers (2023-03-20T17:45:08Z) - ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations.
We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings.
We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z) - Recent Trends in 2D Object Detection and Applications in Video Event
Recognition [0.76146285961466]
We discuss the pioneering works in object detection, followed by the recent breakthroughs that employ deep learning.
We highlight recent datasets for 2D object detection both in images and videos, and present a comparative performance summary of various state-of-the-art object detection techniques.
arXiv Detail & Related papers (2022-02-07T14:15:11Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data.
We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.