Towards Omni-supervised Referring Expression Segmentation
- URL: http://arxiv.org/abs/2311.00397v2
- Date: Mon, 27 Nov 2023 09:02:06 GMT
- Title: Towards Omni-supervised Referring Expression Segmentation
- Authors: Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang,
Xiaoshuai Sun
- Abstract summary: Referring Expression (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions.
We propose a new learning task for RES called Omni-supervised Referring Expression (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data.
- Score: 36.0543534772681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Referring Expression Segmentation (RES) is an emerging task in computer
vision, which segments the target instances in images based on text
descriptions. However, its development is plagued by the expensive segmentation
labels. To address this issue, we propose a new learning task for RES called
Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to
make full use of unlabeled, fully labeled and weakly labeled data, e.g.,
referring points or grounding boxes, for efficient RES training. To accomplish
this task, we also propose a novel yet strong baseline method for Omni-RES
based on the recently popular teacher-student learning, where the weak labels
are not directly transformed into supervision signals but used as a yardstick
to select and refine high-quality pseudo-masks for teacher-student learning. To
validate the proposed Omni-RES method, we apply it to a set of state-of-the-art
RES models and conduct extensive experiments on a bunch of RES datasets. The
experimental results yield the obvious merits of Omni-RES than the
fully-supervised and semi-supervised training schemes. For instance, with only
10% fully labeled data, Omni-RES can help the base model achieve 100% fully
supervised performance, and it also outperform the semi-supervised alternative
by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+,
respectively. More importantly, Omni-RES also enable the use of large-scale
vision-langauges like Visual Genome to facilitate low-cost RES training, and
achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.