Weakly-Supervised Action Localization with Expectation-Maximization
Multi-Instance Learning
- URL: http://arxiv.org/abs/2004.00163v2
- Date: Tue, 25 Aug 2020 19:26:17 GMT
- Title: Weakly-Supervised Action Localization with Expectation-Maximization
Multi-Instance Learning
- Authors: Zhekun Luo, Devin Guillory, Baifeng Shi, Wei Ke, Fang Wan, Trevor
Darrell, Huijuan Xu
- Abstract summary: Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments)
We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions.
- Score: 82.41415008107502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised action localization requires training a model to localize
the action segments in the video given only video level action label. It can be
solved under the Multiple Instance Learning (MIL) framework, where a bag
(video) contains multiple instances (action segments). Since only the bag's
label is known, the main challenge is assigning which key instances within the
bag to trigger the bag's label. Most previous models use attention-based
approaches applying attentions to generate the bag's representation from
instances, and then train it via the bag's classification. These models,
however, implicitly violate the MIL assumption that instances in negative bags
should be uniformly negative. In this work, we explicitly model the key
instances assignment as a hidden variable and adopt an Expectation-Maximization
(EM) framework. We derive two pseudo-label generation schemes to model the E
and M process and iteratively optimize the likelihood lower bound. We show that
our EM-MIL approach more accurately models both the learning objective and the
MIL assumptions. It achieves state-of-the-art performance on two standard
benchmarks, THUMOS14 and ActivityNet1.2.
Related papers
- Sm: enhanced localization in Multiple Instance Learning for medical imaging classification [11.727293641333713]
Multiple Instance Learning (MIL) is widely used in medical imaging classification to reduce the labeling effort.
We propose a novel, principled, and flexible mechanism to model local dependencies.
Our module leads to state-of-the-art performance in localization while being competitive or superior in classification.
arXiv Detail & Related papers (2024-10-04T09:49:28Z) - MOWA: Multiple-in-One Image Warping Model [65.73060159073644]
We propose a Multiple-in-One image warping model (named MOWA) in this work.
We mitigate the difficulty of multi-task learning by disentangling the motion estimation at both the region level and pixel level.
To our knowledge, this is the first work that solves multiple practical warping tasks in one single model.
arXiv Detail & Related papers (2024-04-16T16:50:35Z) - Reproducibility in Multiple Instance Learning: A Case For Algorithmic
Unit Tests [59.623267208433255]
Multiple Instance Learning (MIL) is a sub-domain of classification problems with positive and negative labels and a "bag" of inputs.
In this work, we examine five of the most prominent deep-MIL models and find that none of them respects the standard MIL assumption.
We identify and demonstrate this problem via a proposed "algorithmic unit test", where we create synthetic datasets that can be solved by a MIL respecting model.
arXiv Detail & Related papers (2023-10-27T03:05:11Z) - Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need [18.832471712088353]
We propose an instance-level weakly supervised contrastive learning algorithm for the first time under the MIL setting.
We also propose an accurate pseudo label generation method through prototype learning.
arXiv Detail & Related papers (2023-07-05T12:44:52Z) - Disambiguated Attention Embedding for Multi-Instance Partial-Label
Learning [68.56193228008466]
In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set.
Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels.
We propose an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning.
arXiv Detail & Related papers (2023-05-26T13:25:17Z) - MoBYv2AL: Self-supervised Active Learning for Image Classification [57.4372176671293]
We present MoBYv2AL, a novel self-supervised active learning framework for image classification.
Our contribution lies in lifting MoBY, one of the most successful self-supervised learning algorithms, to the AL pipeline.
We achieve state-of-the-art results when compared to recent AL methods.
arXiv Detail & Related papers (2023-01-04T10:52:02Z) - Feature Re-calibration based MIL for Whole Slide Image Classification [7.92885032436243]
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases.
We propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature.
We employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder.
arXiv Detail & Related papers (2022-06-22T07:00:39Z) - Model Agnostic Interpretability for Multiple Instance Learning [7.412445894287708]
In Multiple Instance Learning (MIL), models are trained using bags of instances, where only a single label is provided for each bag.
In this work, we establish the key requirements for interpreting MIL models.
We then go on to develop several model-agnostic approaches that meet these requirements.
arXiv Detail & Related papers (2022-01-27T17:55:32Z) - CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action
Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually.
We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z) - Dual-stream Maximum Self-attention Multi-instance Learning [11.685285490589981]
Multi-instance learning (MIL) is a form of weakly supervised learning where a single class label is assigned to a bag of instances while the instance-level labels are not available.
We propose a dual-stream maximum self-attention MIL model (DSMIL) parameterized by neural networks.
Our method achieves superior performance compared to the best MIL methods and demonstrates state-of-the-art performance on benchmark MIL datasets.
arXiv Detail & Related papers (2020-06-09T22:44:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.