MOD-CL: Multi-label Object Detection with Constrained Loss
- URL: http://arxiv.org/abs/2403.07885v1
- Date: Wed, 31 Jan 2024 23:13:20 GMT
- Title: MOD-CL: Multi-label Object Detection with Constrained Loss
- Authors: Sota Moriyama, Koji Watanabe, Katsumi Inoue, Akihiro Takemura,
- Abstract summary: In this paper, we use $mathrmMOD_YOLO$, a multi-label object detection model built upon the state-of-the-art object detection model YOLOv8.
In Task 1, we introduce the Corrector Model and Blender Model, two new models that follow after the object detection process, aiming to generate a more constrained output.
For Task 2, constrained losses have been incorporated into the $mathrmMOD_YOLO$ architecture using Product T-Norm.
- Score: 3.92610460921618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce MOD-CL, a multi-label object detection framework that utilizes constrained loss in the training process to produce outputs that better satisfy the given requirements. In this paper, we use $\mathrm{MOD_{YOLO}}$, a multi-label object detection model built upon the state-of-the-art object detection model YOLOv8, which has been published in recent years. In Task 1, we introduce the Corrector Model and Blender Model, two new models that follow after the object detection process, aiming to generate a more constrained output. For Task 2, constrained losses have been incorporated into the $\mathrm{MOD_{YOLO}}$ architecture using Product T-Norm. The results show that these implementations are instrumental to improving the scores for both Task 1 and Task 2.
Related papers
- DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model [67.56918651825056]
The performance of object detection lags behind that of instance segmentation (i.e., performance imbalance) when investigating the intermediate results from the beginning transformer decoder layer of MaskDINO.
This paper proposes DI-MaskDINO model, the core idea of which is to improve the final performance by alleviating the detection-segmentation imbalance.
DI-MaskDINO outperforms existing joint object detection and instance segmentation models on COCO and BDD100K benchmarks.
arXiv Detail & Related papers (2024-10-22T05:22:49Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - TIDE: Test Time Few Shot Object Detection [11.036762620105383]
Few-shot object detection (FSOD) aims to extract semantic knowledge from limited object instances of novel categories within a target domain.
Recent advances in FSOD focus on fine-tuning the base model based on a few objects via meta-learning or data augmentation.
We formalize a novel FSOD task, referred to as Test TIme Few Shot DEtection (TIDE), where the model is un-tuned in the configuration procedure.
arXiv Detail & Related papers (2023-11-30T09:00:44Z) - Contrastive Learning for Multi-Object Tracking with Transformers [79.61791059432558]
We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
arXiv Detail & Related papers (2023-11-14T10:07:52Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection
with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection.
Our framework is based on DETR, with the box regression head replaced with a points prediction head.
Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z) - Few-shot Object Counting and Detection [25.61294147822642]
We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class.
This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count.
We introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR.
arXiv Detail & Related papers (2022-07-22T10:09:18Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.