Related papers: An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition

URL: http://arxiv.org/abs/2210.04933v1
Date: Mon, 10 Oct 2022 18:06:43 GMT
Title: An Action Is Worth Multiple Words: Handling Ambiguity in Action Recognition
Authors: Kiyoon Kim, Davide Moltisanti, Oisin Mac Aodha, Laura Sevilla-Lara
Abstract summary: We address the challenge of training multi-label action recognition models from only single positive training labels. We propose two approaches that are based on generating pseudo training examples sampled from similar instances within the train set. We create a new evaluation benchmark by manually annotating a subset of EPIC-Kitchens-100's validation set with multiple verb labels.
Score: 18.937012620464465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Precisely naming the action depicted in a video can be a challenging and oftentimes ambiguous task. In contrast to object instances represented as nouns (e.g. dog, cat, chair, etc.), in the case of actions, human annotators typically lack a consensus as to what constitutes a specific action (e.g. jogging versus running). In practice, a given video can contain multiple valid positive annotations for the same action. As a result, video datasets often contain significant levels of label noise and overlap between the atomic action classes. In this work, we address the challenge of training multi-label action recognition models from only single positive training labels. We propose two approaches that are based on generating pseudo training examples sampled from similar instances within the train set. Unlike other approaches that use model-derived pseudo-labels, our pseudo-labels come from human annotations and are selected based on feature similarity. To validate our approaches, we create a new evaluation benchmark by manually annotating a subset of EPIC-Kitchens-100's validation set with multiple verb labels. We present results on this new test set along with additional results on a new version of HMDB-51, called Confusing-HMDB-102, where we outperform existing methods in both cases. Data and code are available at https://github.com/kiyoon/verb_ambiguity

Related papers

Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need [18.832471712088353]
We propose an instance-level weakly supervised contrastive learning algorithm for the first time under the MIL setting. We also propose an accurate pseudo label generation method through prototype learning.
arXiv Detail & Related papers (2023-07-05T12:44:52Z)
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification [85.76130799062379]
We study how false negative labels affect the model's explanation. We propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels.
arXiv Detail & Related papers (2023-04-04T14:00:59Z)
DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding. Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition. We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z)
ActiveLab: Active Learning with Re-Labeling by Multiple Annotators [19.84626033109009]
ActiveLab is a method to decide what to label next in batch active learning. It automatically estimates when it is more informative to re-label examples vs. labeling entirely new ones. It reliably trains more accurate classifiers with far fewer annotations than a wide variety of popular active learning methods.
arXiv Detail & Related papers (2023-01-27T17:00:11Z)
Learning with Different Amounts of Annotation: From Zero to Many Labels [19.869498599986006]
Training NLP systems typically assume access to annotated data that has a single human label per example. We explore new annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task.
arXiv Detail & Related papers (2021-09-09T16:48:41Z)
BABEL: Bodies, Action and Behavior with English Labels [53.83774092560076]
We present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. There are over 28k sequence labels, and 63k frame labels in BABEL, which belong to over 250 unique action categories. We demonstrate the value of BABEL as a benchmark, and evaluate the performance of models on 3D action recognition.
arXiv Detail & Related papers (2021-06-17T17:51:14Z)
All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training [32.45488147013166]
Pseudo-labeling is a key component in semi-supervised learning (SSL) We propose SemCo, a method which leverages label semantics and co-training to address this problem. We show that our method achieves state-of-the-art performance across various SSL tasks including 5.6% accuracy improvement on Mini-ImageNet dataset with 1000 labeled examples.
arXiv Detail & Related papers (2021-04-12T07:33:16Z)
Discovering Multi-Label Actor-Action Association in a Weakly Supervised Setting [22.86745487695168]
We propose a baseline based on multi-instance and multi-label learning. We propose a novel approach that uses sets of actions as representation instead of modeling individual action classes. We evaluate the proposed approach on the challenging dataset where the proposed approach outperforms the MIML baseline and is competitive to fully supervised approaches.
arXiv Detail & Related papers (2021-01-21T11:59:47Z)
Few-shot Learning for Multi-label Intent Detection [59.66787898744991]
State-of-the-art work estimates label-instance relevance scores and uses a threshold to select multiple associated intent labels. Experiments on two datasets show that the proposed model significantly outperforms strong baselines in both one-shot and five-shot settings.
arXiv Detail & Related papers (2020-10-11T14:42:18Z)
CycAs: Self-supervised Cycle Association for Learning Re-identifiable Descriptions [61.724894233252414]
This paper proposes a self-supervised learning method for the person re-identification (re-ID) problem. Existing unsupervised methods usually rely on pseudo labels, such as those from video tracklets or clustering. We introduce a different unsupervised method that allows us to learn pedestrian embeddings from raw videos, without resorting to pseudo labels.
arXiv Detail & Related papers (2020-07-15T09:52:35Z)
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning [82.41415008107502]
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments) We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions.
arXiv Detail & Related papers (2020-03-31T23:36:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.