Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory
- URL: http://arxiv.org/abs/2309.03696v1
- Date: Thu, 7 Sep 2023 13:10:06 GMT
- Title: Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory
- Authors: Ting Lei, Fabian Caba, Qingchao Chen, Hailin Jin, Yuxin Peng, Yang Liu
- Abstract summary: We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
- Score: 64.11870454160614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human Object Interaction (HOI) detection aims to localize and infer the
relationships between a human and an object. Arguably, training supervised
models for this task from scratch presents challenges due to the performance
drop over rare classes and the high computational cost and time required to
handle long-tailed distributions of HOIs in complex HOI scenes in realistic
settings. This observation motivates us to design an HOI detector that can be
trained even with long-tailed labeled data and can leverage existing knowledge
from pre-trained models. Inspired by the powerful generalization ability of the
large Vision-Language Models (VLM) on classification and retrieval tasks, we
propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM).
ADA-CM has two operating modes. The first mode makes it tunable without
learning new parameters in a training-free paradigm. Its second mode
incorporates an instance-aware adapter mechanism that can further efficiently
boost performance if updating a lightweight set of parameters can be afforded.
Our proposed method achieves competitive results with state-of-the-art on the
HICO-DET and V-COCO datasets with much less training time. Code can be found at
https://github.com/ltttpku/ADA-CM.
Related papers
- Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - Disentangled Pre-training for Human-Object Interaction Detection [22.653500926559833]
We propose an efficient disentangled pre-training method for HOI detection (DP-HOI)
DP-HOI utilizes object detection and action recognition datasets to pre-train the detection and interaction decoder layers.
It significantly enhances the performance of existing HOI detection models on a broad range of rare categories.
arXiv Detail & Related papers (2024-04-02T08:21:16Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Pre-train, Adapt and Detect: Multi-Task Adapter Tuning for Camouflaged
Object Detection [38.5505943598037]
We propose a novel pre-train, adapt and detect' paradigm to detect camouflaged objects.
By introducing a large pre-trained model, abundant knowledge learned from massive multi-modal data can be directly transferred to COD.
Our method outperforms existing state-of-the-art COD models by large margins.
arXiv Detail & Related papers (2023-07-20T08:25:38Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Cross-modal Knowledge Distillation for Vision-to-Sensor Action
Recognition [12.682984063354748]
This study introduces an end-to-end Vision-to-Sensor Knowledge Distillation (VSKD) framework.
In this VSKD framework, only time-series data, i.e., accelerometer data, is needed from wearable devices during the testing phase.
This framework will not only reduce the computational demands on edge devices, but also produce a learning model that closely matches the performance of the computational expensive multi-modal approach.
arXiv Detail & Related papers (2021-10-08T15:06:38Z) - MM-FSOD: Meta and metric integrated few-shot object detection [14.631208179789583]
We present an effective object detection framework (MM-FSOD) that integrates metric learning and meta-learning.
Our model is a class-agnostic detection model that can accurately recognize new categories, which are not appearing in training samples.
arXiv Detail & Related papers (2020-12-30T14:02:52Z) - DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets.
We propose an efficient and effective data augmentation method called DecAug for HOI detection.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.