Evaluation for Weakly Supervised Object Localization: Protocol, Metrics,
and Datasets
- URL: http://arxiv.org/abs/2007.04178v2
- Date: Tue, 7 Dec 2021 05:21:18 GMT
- Title: Evaluation for Weakly Supervised Object Localization: Protocol, Metrics,
and Datasets
- Authors: Junsuk Choe, Seong Joon Oh, Sanghyuk Chun, Seungho Lee, Zeynep Akata,
Hyunjung Shim
- Abstract summary: We argue that weakly-supervised object localization (WSOL) task is ill-posed with only image-level labels.
We propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
- Score: 65.73451960585571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised object localization (WSOL) has gained popularity over the
last years for its promise to train localization models with only image-level
labels. Since the seminal WSOL work of class activation mapping (CAM), the
field has focused on how to expand the attention regions to cover objects more
broadly and localize them better. However, these strategies rely on full
localization supervision for validating hyperparameters and model selection,
which is in principle prohibited under the WSOL setup. In this paper, we argue
that WSOL task is ill-posed with only image-level labels, and propose a new
evaluation protocol where full supervision is limited to only a small held-out
set not overlapping with the test set. We observe that, under our protocol, the
five most recent WSOL methods have not made a major improvement over the CAM
baseline. Moreover, we report that existing WSOL methods have not reached the
few-shot learning baseline, where the full-supervision at validation time is
used for model training instead. Based on our findings, we discuss some future
directions for WSOL.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task.
OV-STAD requires training a model on a limited set of base classes with box and label supervision.
To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z) - Bagging Regional Classification Activation Maps for Weakly Supervised
Object Localization [11.25759292976175]
BagCAMs is a plug-and-play mechanism to better project a well-trained classifier for the localization task.
Our BagCAMs adopts a proposed regional localizer generation strategy to define a set of regional localizers.
Experiments indicate that adopting our proposed BagCAMs can improve the performance of baseline WSOL methods.
arXiv Detail & Related papers (2022-07-16T03:03:01Z) - Weakly Supervised Object Localization as Domain Adaption [19.854125742336688]
Weakly supervised object localization (WSOL) focuses on localizing objects only with the supervision of image-level classification masks.
Most previous WSOL methods follow the classification activation map (CAM) that localizes objects based on the classification structure with the multi-instance learning (MIL) mechanism.
This work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects.
arXiv Detail & Related papers (2022-03-03T13:50:22Z) - Shallow Feature Matters for Weakly Supervised Object Localization [35.478997006168484]
Weakly supervised object localization (WSOL) aims to localize objects by only utilizing image-level labels.
Previous CAM-based methods did not take full advantage of the shallow features, despite their importance for WSOL.
In this paper, we propose a simple but effective Shallow feature-aware Pseudo supervised Object localization (SPOL) model for accurate WSOL.
arXiv Detail & Related papers (2021-08-02T13:16:48Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z) - Rethinking the Route Towards Weakly Supervised Object Localization [28.90792512056726]
We show that weakly supervised object localization should be divided into two parts: class-agnostic object localization and object classification.
For class-agnostic object localization, we should use class-agnostic methods to generate noisy pseudo annotations and then perform bounding box regression on them without class labels.
Our PSOL models have good transferability across different datasets without fine-tuning.
arXiv Detail & Related papers (2020-02-26T08:54:20Z) - Evaluating Weakly Supervised Object Localization Methods Right [65.73451960585571]
We argue that weakly-supervised object localization (WSOL) task is ill-posed with only image-level labels.
We propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set.
arXiv Detail & Related papers (2020-01-21T10:50:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.