Improving Long-tailed Object Detection with Image-Level Supervision by
Multi-Task Collaborative Learning
- URL: http://arxiv.org/abs/2210.05568v1
- Date: Tue, 11 Oct 2022 16:02:14 GMT
- Title: Improving Long-tailed Object Detection with Image-Level Supervision by
Multi-Task Collaborative Learning
- Authors: Bo Li, Yongqiang Yao, Jingru Tan, Xin Lu, Fengwei Yu, Ye Luo, Jianwei
Lu
- Abstract summary: We propose a novel framework, CLIS, which leverage image-level supervision to enhance the detection ability in a multi-task collaborative way.
CLIS achieves an overall AP of 31.1 with 10.1 point improvement on tail categories, establishing a new state-of-the-art.
- Score: 18.496765732728164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data in real-world object detection often exhibits the long-tailed
distribution. Existing solutions tackle this problem by mitigating the
competition between the head and tail categories. However, due to the scarcity
of training samples, tail categories are still unable to learn discriminative
representations. Bringing more data into the training may alleviate the
problem, but collecting instance-level annotations is an excruciating task. In
contrast, image-level annotations are easily accessible but not fully
exploited. In this paper, we propose a novel framework CLIS (multi-task
Collaborative Learning with Image-level Supervision), which leverage
image-level supervision to enhance the detection ability in a multi-task
collaborative way. Specifically, there are an object detection task (consisting
of an instance-classification task and a localization task) and an
image-classification task in our framework, responsible for utilizing the two
types of supervision. Different tasks are trained collaboratively by three key
designs: (1) task-specialized sub-networks that learn specific representations
of different tasks without feature entanglement. (2) a siamese sub-network for
the image-classification task that shares its knowledge with the
instance-classification task, resulting in feature enrichment of detectors. (3)
a contrastive learning regularization that maintains representation
consistency, bridging feature gaps of different supervision. Extensive
experiments are conducted on the challenging LVIS dataset. Without
sophisticated loss engineering, CLIS achieves an overall AP of 31.1 with 10.1
point improvement on tail categories, establishing a new state-of-the-art. Code
will be at https://github.com/waveboo/CLIS.
Related papers
- Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness [44.15562068190958]
In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings.
State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable.
We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras.
arXiv Detail & Related papers (2024-07-07T17:17:52Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Taskology: Utilizing Task Relations at Scale [28.09712466727001]
We show that we can leverage the inherent relationships among collections of tasks, as they are trained jointly.
explicitly utilizing the relationships between tasks allows improving their performance while dramatically reducing the need for labeled data.
We demonstrate our framework on subsets of the following collection of tasks: depth and normal prediction, semantic segmentation, 3D motion and ego-motion estimation, and object tracking and 3D detection in point clouds.
arXiv Detail & Related papers (2020-05-14T22:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.