Improving Human-Object Interaction Detection via Virtual Image Learning
- URL: http://arxiv.org/abs/2308.02606v1
- Date: Fri, 4 Aug 2023 10:28:48 GMT
- Title: Improving Human-Object Interaction Detection via Virtual Image Learning
- Authors: Shuman Fang, Shuai Liu, Jie Li, Guannan Jiang, Xianming Lin, Rongrong
Ji
- Abstract summary: Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
- Score: 68.56682347374422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-Object Interaction (HOI) detection aims to understand the interactions
between humans and objects, which plays a curtail role in high-level semantic
understanding tasks. However, most works pursue designing better architectures
to learn overall features more efficiently, while ignoring the long-tail nature
of interaction-object pair categories. In this paper, we propose to alleviate
the impact of such an unbalanced distribution via Virtual Image Leaning (VIL).
Firstly, a novel label-to-image approach, Multiple Steps Image Creation
(MUSIC), is proposed to create a high-quality dataset that has a consistent
distribution with real images. In this stage, virtual images are generated
based on prompts with specific characterizations and selected by
multi-filtering processes. Secondly, we use both virtual and real images to
train the model with the teacher-student framework. Considering the initial
labels of some virtual images are inaccurate and inadequate, we devise an
Adaptive Matching-and-Filtering (AMF) module to construct pseudo-labels. Our
method is independent of the internal structure of HOI detectors, so it can be
combined with off-the-shelf methods by training merely 10 additional epochs.
With the assistance of our method, multiple methods obtain significant
improvements, and new state-of-the-art results are achieved on two benchmarks.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Neural Congealing: Aligning Images to a Joint Semantic Atlas [14.348512536556413]
We present a zero-shot self-supervised framework for aligning semantically-common content across a set of images.
Our approach harnesses the power of pre-trained DINO-ViT features to learn.
We show that our method performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.
arXiv Detail & Related papers (2023-02-08T09:26:22Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - iFAN: Image-Instance Full Alignment Networks for Adaptive Object
Detection [48.83883375118966]
iFAN aims to precisely align feature distributions on both image and instance levels.
It outperforms state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
arXiv Detail & Related papers (2020-03-09T13:27:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.