Detecting Human-Object Interaction with Mixed Supervision
- URL: http://arxiv.org/abs/2011.04971v2
- Date: Thu, 12 Nov 2020 14:14:21 GMT
- Title: Detecting Human-Object Interaction with Mixed Supervision
- Authors: Suresh Kirthi Kumaraswamy (1), Miaojing Shi (2) and Ewa Kijak (3) ((1)
Univ Le Mans, CNRS, IRISA, (2) Kings College London, (3) Univ Rennes, Inria,
CNRS, IRISA)
- Abstract summary: Human object interaction (HOI) detection is an important task in image understanding and reasoning.
We propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent learning.
Our method is evaluated on the challenging HICO-DET dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Human object interaction (HOI) detection is an important task in image
understanding and reasoning. It is in a form of HOI triplet <human; verb;
object>, requiring bounding boxes for human and object, and action between them
for the task completion. In other words, this task requires strong supervision
for training that is however hard to procure. A natural solution to overcome
this is to pursue weakly-supervised learning, where we only know the presence
of certain HOI triplets in images but their exact location is unknown. Most
weakly-supervised learning methods do not make provision for leveraging data
with strong supervision, when they are available; and indeed a na\"ive
combination of this two paradigms in HOI detection fails to make contributions
to each other. In this regard we propose a mixed-supervised HOI detection
pipeline: thanks to a specific design of momentum-independent learning that
learns seamlessly across these two types of supervision. Moreover, in light of
the annotation insufficiency in mixed supervision, we introduce an HOI element
swapping technique to synthesize diverse and hard negatives across images and
improve the robustness of the model. Our method is evaluated on the challenging
HICO-DET dataset. It performs close to or even better than many
fully-supervised methods by using a mixed amount of strong and weak
annotations; furthermore, it outperforms representative state of the art weakly
and fully-supervised methods under the same supervision.
Related papers
- Joint Salient Object Detection and Camouflaged Object Detection via
Uncertainty-aware Learning [47.253370009231645]
We introduce an uncertainty-aware learning pipeline to explore the contradictory information of salient object detection (SOD) and camouflaged object detection (COD)
Our solution leads to both state-of-the-art performance and informative uncertainty estimation.
arXiv Detail & Related papers (2023-07-10T15:49:37Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Weakly-supervised HOI Detection via Prior-guided Bi-level Representation
Learning [66.00600682711995]
Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks.
One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only.
This is inherently challenging due to ambiguous human-object associations, large search space of detecting HOIs and highly noisy training signal.
We develop a CLIP-guided HOI representation capable of incorporating the prior knowledge at both image level and HOI instance level, and adopt a self-taught mechanism to prune incorrect human-object associations.
arXiv Detail & Related papers (2023-03-02T14:41:31Z) - Imitation from Observation With Bootstrapped Contrastive Learning [12.048166025000976]
Imitation from observation (IfO) is a learning paradigm that consists of training autonomous agents in a Markov Decision Process.
We present BootIfOL, an IfO algorithm that aims to learn a reward function that takes an agent trajectory and compares it to an expert.
We evaluate our approach on a variety of control tasks showing that we can train effective policies using a limited number of demonstrative trajectories.
arXiv Detail & Related papers (2023-02-13T17:32:17Z) - Decoupled Adversarial Contrastive Learning for Self-supervised
Adversarial Robustness [69.39073806630583]
Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields.
We propose a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL)
arXiv Detail & Related papers (2022-07-22T06:30:44Z) - On Higher Adversarial Susceptibility of Contrastive Self-Supervised
Learning [104.00264962878956]
Contrastive self-supervised learning (CSL) has managed to match or surpass the performance of supervised learning in image and video classification.
It is still largely unknown if the nature of the representation induced by the two learning paradigms is similar.
We identify the uniform distribution of data representation over a unit hypersphere in the CSL representation space as the key contributor to this phenomenon.
We devise strategies that are simple, yet effective in improving model robustness with CSL training.
arXiv Detail & Related papers (2022-07-22T03:49:50Z) - CGUA: Context-Guided and Unpaired-Assisted Weakly Supervised Person
Search [54.106662998673514]
We introduce a Context-Guided and Unpaired-Assisted (CGUA) weakly supervised person search framework.
Specifically, we propose a novel Context-Guided Cluster (CGC) algorithm to leverage context information in the clustering process.
Our method achieves comparable or better performance to the state-of-the-art supervised methods by leveraging more diverse unlabeled data.
arXiv Detail & Related papers (2022-03-27T13:57:30Z) - Unsupervised Clustering Active Learning for Person Re-identification [5.705895028045853]
Unsupervised re-id methods rely on unlabeled data to train models.
We present a Unsupervised Clustering Active Learning (UCAL) re-id deep learning approach.
It is capable of incrementally discovering the representative centroid-pairs.
arXiv Detail & Related papers (2021-12-26T02:54:35Z) - Combining Self-Training and Self-Supervised Learning for Unsupervised
Disfluency Detection [80.68446022994492]
In this work, we explore the unsupervised learning paradigm which can potentially work with unlabeled text corpora.
Our model builds upon the recent work on Noisy Student Training, a semi-supervised learning approach that extends the idea of self-training.
arXiv Detail & Related papers (2020-10-29T05:29:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.