A Multi Camera Unsupervised Domain Adaptation Pipeline for Object
Detection in Cultural Sites through Adversarial Learning and Self-Training
- URL: http://arxiv.org/abs/2210.00808v1
- Date: Mon, 3 Oct 2022 10:40:58 GMT
- Title: A Multi Camera Unsupervised Domain Adaptation Pipeline for Object
Detection in Cultural Sites through Adversarial Learning and Self-Training
- Authors: Giovanni Pasqualino and Antonino Furnari and Giovanni Maria Farinella
- Abstract summary: We present a new dataset collected in a cultural site to study the problem of domain adaptation for object detection.
We present a new domain adaptation method which outperforms current state-of-the-art approaches.
- Score: 23.186208885878926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object detection algorithms allow to enable many interesting applications
which can be implemented in different devices, such as smartphones and wearable
devices. In the context of a cultural site, implementing these algorithms in a
wearable device, such as a pair of smart glasses, allow to enable the use of
augmented reality (AR) to show extra information about the artworks and enrich
the visitors' experience during their tour. However, object detection
algorithms require to be trained on many well annotated examples to achieve
reasonable results. This brings a major limitation since the annotation process
requires human supervision which makes it expensive in terms of time and costs.
A possible solution to reduce these costs consist in exploiting tools to
automatically generate synthetic labeled images from a 3D model of the site.
However, models trained with synthetic data do not generalize on real images
acquired in the target scenario in which they are supposed to be used.
Furthermore, object detectors should be able to work with different wearable
devices or different mobile devices, which makes generalization even harder. In
this paper, we present a new dataset collected in a cultural site to study the
problem of domain adaptation for object detection in the presence of multiple
unlabeled target domains corresponding to different cameras and a labeled
source domain obtained considering synthetic images for training purposes. We
present a new domain adaptation method which outperforms current
state-of-the-art approaches combining the benefits of aligning the domains at
the feature and pixel level with a self-training process. We release the
dataset at the following link https://iplab.dmi.unict.it/OBJ-MDA/ and the code
of the proposed architecture at https://github.com/fpv-iplab/STMDA-RetinaNet.
Related papers
- Transfer learning with generative models for object detection on limited datasets [1.4999444543328293]
In some fields, such as marine biology, it is necessary to have correctly labeled bounding boxes around each object.
We propose a transfer learning framework that is valid for a generic scenario.
Our results pave the way for new generative AI-based protocols for machine learning applications in various domains.
arXiv Detail & Related papers (2024-02-09T21:17:31Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Label-Free Synthetic Pretraining of Object Detectors [67.17371526567325]
We propose a new approach, Synthetic optimized layout with Instance Detection (SOLID), to pretrain object detectors with synthetic images.
Our "SOLID" approach consists of two main components: (1) generating synthetic images using a collection of unlabelled 3D models with optimized scene arrangement; (2) pretraining an object detector on "instance detection" task.
Our approach does not need any semantic labels for pretraining and allows the use of arbitrary, diverse 3D models.
arXiv Detail & Related papers (2022-08-08T16:55:17Z) - Multi-modal Transformers Excel at Class-agnostic Object Detection [105.10403103027306]
We argue that existing methods lack a top-down supervision signal governed by human-understandable semantics.
We develop an efficient and flexible MViT architecture using multi-scale feature processing and deformable self-attention.
We show the significance of MViT proposals in a diverse range of applications.
arXiv Detail & Related papers (2021-11-22T18:59:29Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - Co-training for On-board Deep Object Detection [0.0]
Best performing deep vision-based object detectors are trained in a supervised manner by relying on human-labeled bounding boxes.
Co-training is a semi-supervised learning method for self-labeling objects in unlabeled images.
We show how co-training is a paradigm worth to pursue for alleviating object labeling, working both alone and together with task-agnostic domain adaptation.
arXiv Detail & Related papers (2020-08-12T19:08:59Z) - An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork
Recognition in Cultural Sites [20.99718135562034]
We consider the problem of Unsupervised Domain Adaptation for object detection in cultural sites.
We create a new dataset containing both synthetic and real images of 16 different artworks.
We propose a new method which builds on RetinaNet and feature alignment that we called DA-RetinaNet.
arXiv Detail & Related papers (2020-08-04T23:51:06Z) - Improving Object Detection with Selective Self-supervised Self-training [62.792445237541145]
We study how to leverage Web images to augment human-curated object detection datasets.
We retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods.
We propose a novel learning method motivated by two parallel lines of work that explore unlabeled data for image classification.
arXiv Detail & Related papers (2020-07-17T18:05:01Z) - Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly
Supervised Object Detection [76.9756607002489]
We propose a novel webly supervised object detection (WebSOD) method for novel classes.
Our proposed method combines bottom-up and top-down cues for novel class detection.
We demonstrate our proposed method on PASCAL VOC dataset with three different novel/base splits.
arXiv Detail & Related papers (2020-03-22T03:11:24Z) - Real-Time Object Detection and Recognition on Low-Compute Humanoid
Robots using Deep Learning [0.12599533416395764]
We describe a novel architecture that enables multiple low-compute NAO robots to perform real-time detection, recognition and localization of objects in its camera view.
The proposed algorithm for object detection and localization is an empirical modification of YOLOv3, based on indoor experiments in multiple scenarios.
The architecture also comprises of an effective end-to-end pipeline to feed the real-time frames from the camera feed to the neural net and use its results for guiding the robot.
arXiv Detail & Related papers (2020-01-20T05:24:58Z) - Virtual to Real adaptation of Pedestrian Detectors [9.432150710329607]
ViPeD is a new synthetically generated set of images collected with the graphical engine of the video game GTA V - Grand Theft Auto V.
We propose two different Domain Adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection.
Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data.
arXiv Detail & Related papers (2020-01-09T14:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.