An Action Is Worth Multiple Words: Handling Ambiguity in Action
Recognition
- URL: http://arxiv.org/abs/2210.04933v1
- Date: Mon, 10 Oct 2022 18:06:43 GMT
- Title: An Action Is Worth Multiple Words: Handling Ambiguity in Action
Recognition
- Authors: Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
- Abstract summary: We address the challenge of training multi-label action recognition models from only single positive training labels.
We propose two approaches that are based on generating pseudo training examples sampled from similar instances within the train set.
We create a new evaluation benchmark by manually annotating a subset of EPIC-Kitchens-100's validation set with multiple verb labels.
- Score: 18.937012620464465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Precisely naming the action depicted in a video can be a challenging and
oftentimes ambiguous task. In contrast to object instances represented as nouns
(e.g. dog, cat, chair, etc.), in the case of actions, human annotators
typically lack a consensus as to what constitutes a specific action (e.g.
jogging versus running). In practice, a given video can contain multiple valid
positive annotations for the same action. As a result, video datasets often
contain significant levels of label noise and overlap between the atomic action
classes. In this work, we address the challenge of training multi-label action
recognition models from only single positive training labels. We propose two
approaches that are based on generating pseudo training examples sampled from
similar instances within the train set. Unlike other approaches that use
model-derived pseudo-labels, our pseudo-labels come from human annotations and
are selected based on feature similarity. To validate our approaches, we create
a new evaluation benchmark by manually annotating a subset of
EPIC-Kitchens-100's validation set with multiple verb labels. We present
results on this new test set along with additional results on a new version of
HMDB-51, called Confusing-HMDB-102, where we outperform existing methods in
both cases. Data and code are available at
https://github.com/kiyoon/verb_ambiguity
Related papers
- Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need [18.832471712088353]
We propose an instance-level weakly supervised contrastive learning algorithm for the first time under the MIL setting.
We also propose an accurate pseudo label generation method through prototype learning.
arXiv Detail & Related papers (2023-07-05T12:44:52Z) - Bridging the Gap between Model Explanations in Partially Annotated
Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation.
We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - ActiveLab: Active Learning with Re-Labeling by Multiple Annotators [19.84626033109009]
ActiveLab is a method to decide what to label next in batch active learning.
It automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones.
It reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.
arXiv Detail & Related papers (2023-01-27T17:00:11Z) - Learning with Different Amounts of Annotation: From Zero to Many Labels [19.869498599986006]
Training NLP systems typically assume access to annotated data that has a single human label per example.
We explore new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples.
Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task.
arXiv Detail & Related papers (2021-09-09T16:48:41Z) - BABEL: Bodies, Action and Behavior with English Labels [53.83774092560076]
We present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences.
There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories.
We demonstrate the value of BABEL as a benchmark, and evaluate the performance of models on 3D action recognition.
arXiv Detail & Related papers (2021-06-17T17:51:14Z) - All Labels Are Not Created Equal: Enhancing Semi-supervision via Label
Grouping and Co-training [32.45488147013166]
Pseudo-labeling is a key component in semi-supervised learning (SSL)
We propose SemCo, a method which leverages label semantics and co-training to address this problem.
We show that our method achieves state-of-the-art performance across various SSL tasks including 5.6% accuracy improvement on Mini-ImageNet dataset with 1000 labeled examples.
arXiv Detail & Related papers (2021-04-12T07:33:16Z) - Discovering Multi-Label Actor-Action Association in a Weakly Supervised
Setting [22.86745487695168]
We propose a baseline based on multi-instance and multi-label learning.
We propose a novel approach that uses sets of actions as representation instead of modeling individual action classes.
We evaluate the proposed approach on the challenging dataset where the proposed approach outperforms the MIML baseline and is competitive to fully supervised approaches.
arXiv Detail & Related papers (2021-01-21T11:59:47Z) - Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels.
Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z) - CycAs: Self-supervised Cycle Association for Learning Re-identifiable
Descriptions [61.724894233252414]
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem.
Existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering.
We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels.
arXiv Detail & Related papers (2020-07-15T09:52:35Z) - Weakly-Supervised Action Localization with Expectation-Maximization
Multi-Instance Learning [82.41415008107502]
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label.
It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments)
We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions.
arXiv Detail & Related papers (2020-03-31T23:36:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.