Related papers: Improving Object Detection via Local-global Contrastive Learning

Improving Object Detection via Local-global Contrastive Learning

URL: http://arxiv.org/abs/2410.05058v2
Date: Fri, 25 Oct 2024 11:53:55 GMT
Title: Improving Object Detection via Local-global Contrastive Learning
Authors: Danai Triantafyllidou, Sarah Parisot, Ales Leonardis, Steven McDonagh,
Abstract summary: We present a novel image-to-image translation method that specifically targets cross-domain object detection. We learn to represent objects by contrasting local-global information. This affords investigation of an under-explored challenge: obtaining performant detection, under domain shifts.
Score: 27.660633883387753
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual domain gaps often impact object detection performance. Image-to-image translation can mitigate this effect, where contrastive approaches enable learning of the image-to-image mapping under unsupervised regimes. However, existing methods often fail to handle content-rich scenes with multiple object instances, which manifests in unsatisfactory detection performance. Sensitivity to such instance-level content is typically only gained through object annotations, which can be expensive to obtain. Towards addressing this issue, we present a novel image-to-image translation method that specifically targets cross-domain object detection. We formulate our approach as a contrastive learning framework with an inductive prior that optimises the appearance of object instances through spatial attention masks, implicitly delineating the scene into foreground regions associated with the target object instances and background non-object regions. Instead of relying on object annotations to explicitly account for object instances during translation, our approach learns to represent objects by contrasting local-global information. This affords investigation of an under-explored challenge: obtaining performant detection, under domain shifts, without relying on object annotations nor detector model fine-tuning. We experiment with multiple cross-domain object detection settings across three challenging benchmarks and report state-of-the-art performance. Project page: https://local-global-detection.github.io

Related papers

ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding [42.10086029931937]
Visual grounding aims to localize the object referred to in an image based on a natural language query. Existing methods demonstrate a significant performance drop when there are multiple distractions in an image. We propose a novel approach, the Relation and Semantic-sensitive Visual Grounding (ResVG) model, to address this issue.
arXiv Detail & Related papers (2024-08-29T07:32:01Z)
Image Segmentation-based Unsupervised Multiple Objects Discovery [1.7674345486888503]
Unsupervised object discovery aims to localize objects in images. We propose a fully unsupervised, bottom-up approach, for multiple objects discovery. We provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation.
arXiv Detail & Related papers (2022-12-20T09:48:24Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Context-Aware Transfer Attacks for Object Detection [51.65308857232767]
We present a new approach to generate context-aware attacks for object detectors. We show that by using co-occurrence of objects and their relative locations and sizes as context information, we can successfully generate targeted mis-categorization attacks.
arXiv Detail & Related papers (2021-12-06T18:26:39Z)
Towards Accurate Localization by Instance Search [2.0539994999823334]
A self-paced learning framework is proposed to achieve accurate object localization on the rank list returned by instance search. The proposed framework mines the target instance gradually from the queries and their corresponding top-ranked search results. In addition to performing localization on instance search, the issue of few-shot object detection is also addressed under the same framework.
arXiv Detail & Related papers (2021-07-11T10:03:31Z)
A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images. We present a simple yet surprisingly effective framework to do so. Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z)
Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization. We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning. Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z)
Slender Object Detection: Diagnoses and Improvements [74.40792217534]
In this paper, we are concerned with the detection of a particular type of objects with extreme aspect ratios, namely textbfslender objects. For a classical object detection method, a drastic drop of $18.9%$ mAP on COCO is observed, if solely evaluated on slender objects.
arXiv Detail & Related papers (2020-11-17T09:39:42Z)
Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations. We present a method that uses the attributes in this "textual scene graph" to train object detectors. We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z)
Cross-Supervised Object Detection [42.783400918552765]
We show how to build better object detectors from weakly labeled images of new categories by leveraging knowledge learned from fully labeled base categories. We propose a unified framework that combines a detection head trained from instance-level annotations and a recognition head learned from image-level annotations.
arXiv Detail & Related papers (2020-06-26T15:33:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.