An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork
Recognition in Cultural Sites
- URL: http://arxiv.org/abs/2008.01882v3
- Date: Mon, 21 Dec 2020 20:37:19 GMT
- Title: An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork
Recognition in Cultural Sites
- Authors: Giovanni Pasqualino and Antonino Furnari and Giovanni Signorello and
Giovanni Maria Farinella
- Abstract summary: We consider the problem of Unsupervised Domain Adaptation for object detection in cultural sites.
We create a new dataset containing both synthetic and real images of 16 different artworks.
We propose a new method which builds on RetinaNet and feature alignment that we called DA-RetinaNet.
- Score: 20.99718135562034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognizing artworks in a cultural site using images acquired from the user's
point of view (First Person Vision) allows to build interesting applications
for both the visitors and the site managers. However, current object detection
algorithms working in fully supervised settings need to be trained with large
quantities of labeled data, whose collection requires a lot of times and high
costs in order to achieve good performance. Using synthetic data generated from
the 3D model of the cultural site to train the algorithms can reduce these
costs. On the other hand, when these models are tested with real images, a
significant drop in performance is observed due to the differences between real
and synthetic images. In this study we consider the problem of Unsupervised
Domain Adaptation for object detection in cultural sites. To address this
problem, we created a new dataset containing both synthetic and real images of
16 different artworks. We hence investigated different domain adaptation
techniques based on one-stage and two-stage object detector, image-to-image
translation and feature alignment. Based on the observation that single-stage
detectors are more robust to the domain shift in the considered settings, we
proposed a new method which builds on RetinaNet and feature alignment that we
called DA-RetinaNet. The proposed approach achieves better results than
compared methods on the proposed dataset and on Cityscapes. To support research
in this field we release the dataset at the following link
https://iplab.dmi.unict.it/EGO-CH-OBJ-UDA/ and the code of the proposed
architecture at https://github.com/fpv-iplab/DA-RetinaNet.
Related papers
- Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - A Multi Camera Unsupervised Domain Adaptation Pipeline for Object
Detection in Cultural Sites through Adversarial Learning and Self-Training [23.186208885878926]
We present a new dataset collected in a cultural site to study the problem of domain adaptation for object detection.
We present a new domain adaptation method which outperforms current state-of-the-art approaches.
arXiv Detail & Related papers (2022-10-03T10:40:58Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images.
We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image.
We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z) - Free Lunch for Co-Saliency Detection: Context Adjustment [14.688461235328306]
We propose a "cost-free" group-cut-paste (GCP) procedure to leverage images from off-the-shelf saliency detection datasets and synthesize new samples.
We collect a novel dataset called Context Adjustment Training. The two variants of our dataset, i.e., CAT and CAT+, consist of 16,750 and 33,500 images, respectively.
arXiv Detail & Related papers (2021-08-04T14:51:37Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Six-channel Image Representation for Cross-domain Object Detection [17.854940064699985]
Deep learning models are data-driven and the excellent performance is highly dependent on the abundant and diverse datasets.
Some image-to-image translation techniques are employed to generate some fake data of some specific scenes to train the models.
We propose to inspire the original 3-channel images and their corresponding GAN-generated fake images to form 6-channel representations of the dataset.
arXiv Detail & Related papers (2021-01-03T04:50:03Z) - Virtual to Real adaptation of Pedestrian Detectors [9.432150710329607]
ViPeD is a new synthetically generated set of images collected with the graphical engine of the video game GTA V - Grand Theft Auto V.
We propose two different Domain Adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection.
Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data.
arXiv Detail & Related papers (2020-01-09T14:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.