Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning
for Generating a City-Scale Vehicle Dataset
- URL: http://arxiv.org/abs/2111.12122v1
- Date: Tue, 23 Nov 2021 19:42:12 GMT
- Title: Bounding Box-Free Instance Segmentation Using Semi-Supervised Learning
for Generating a City-Scale Vehicle Dataset
- Authors: Osmar Luiz Ferreira de Carvalho, Osmar Ab\'ilio de Carvalho J\'unior,
Anesmar Olino de Albuquerque, Nickolas Castro Santana, Dibio Leandro Borges,
Roberto Arnaldo Trancoso Gomes, Renato Fontes Guimar\~aes
- Abstract summary: Vehicle classification is a hot computer vision topic, with studies ranging from ground-view up to top-view imagery.
In this paper, we propose a novel semi-supervised iterative learning approach using GIS software.
The results show better pixel-wise metrics when compared to the Mask-RCNN (82% against 67% in IoU)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vehicle classification is a hot computer vision topic, with studies ranging
from ground-view up to top-view imagery. In remote sensing, the usage of
top-view images allows for understanding city patterns, vehicle concentration,
traffic management, and others. However, there are some difficulties when
aiming for pixel-wise classification: (a) most vehicle classification studies
use object detection methods, and most publicly available datasets are designed
for this task, (b) creating instance segmentation datasets is laborious, and
(c) traditional instance segmentation methods underperform on this task since
the objects are small. Thus, the present research objectives are: (1) propose a
novel semi-supervised iterative learning approach using GIS software, (2)
propose a box-free instance segmentation approach, and (3) provide a city-scale
vehicle dataset. The iterative learning procedure considered: (1) label a small
number of vehicles, (2) train on those samples, (3) use the model to classify
the entire image, (4) convert the image prediction into a polygon shapefile,
(5) correct some areas with errors and include them in the training data, and
(6) repeat until results are satisfactory. To separate instances, we considered
vehicle interior and vehicle borders, and the DL model was the U-net with the
Efficient-net-B7 backbone. When removing the borders, the vehicle interior
becomes isolated, allowing for unique object identification. To recover the
deleted 1-pixel borders, we proposed a simple method to expand each prediction.
The results show better pixel-wise metrics when compared to the Mask-RCNN (82%
against 67% in IoU). On per-object analysis, the overall accuracy, precision,
and recall were greater than 90%. This pipeline applies to any remote sensing
target, being very efficient for segmentation and generating datasets.
Related papers
- Lidar Annotation Is All You Need [0.0]
This paper aims to improve the efficiency of image segmentation using a convolutional neural network in a multi-sensor setup.
The key innovation of our approach is the masked loss, addressing sparse ground-truth masks from point clouds.
Experimental validation of the approach on benchmark datasets shows comparable performance to a high-quality image segmentation model.
arXiv Detail & Related papers (2023-11-08T15:55:18Z) - Improving Online Lane Graph Extraction by Object-Lane Clustering [106.71926896061686]
We propose an architecture and loss formulation to improve the accuracy of local lane graph estimates.
The proposed method learns to assign the objects to centerlines by considering the centerlines as cluster centers.
We show that our method can achieve significant performance improvements by using the outputs of existing 3D object detection methods.
arXiv Detail & Related papers (2023-07-20T15:21:28Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Weakly Supervised Training of Monocular 3D Object Detectors Using Wide
Baseline Multi-view Traffic Camera Data [19.63193201107591]
7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users.
We develop an approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras.
Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets.
arXiv Detail & Related papers (2021-10-21T08:26:48Z) - Self-supervised Learning of 3D Object Understanding by Data Association
and Landmark Estimation for Image Sequence [15.815583594196488]
3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.
It is challenging to obtain large amount of 3D dataset since achieving 3D annotation is expensive andtime-consuming.
We propose a strategy to exploit multipleobservations of the object in the image sequence in orderto surpass the self-performance.
arXiv Detail & Related papers (2021-04-14T18:59:08Z) - Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data
Augmentation [77.60050239225086]
We propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images.
Our approach is fully automatic without any human interaction.
We present a multi-task network for VUS parsing and a multi-stream network for VHI parsing.
arXiv Detail & Related papers (2020-12-15T03:03:38Z) - Learning Collision-Free Space Detection from Stereo Images: Homography
Matrix Brings Better Data Augmentation [16.99302954185652]
It remains an open challenge to train deep convolutional neural networks (DCNNs) using only a small quantity of training samples.
This paper explores an effective training data augmentation approach that can be employed to improve the overall DCNN performance.
arXiv Detail & Related papers (2020-12-14T19:14:35Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - PerMO: Perceiving More at Once from a Single Image for Autonomous
Driving [76.35684439949094]
We present a novel approach to detect, segment, and reconstruct complete textured 3D models of vehicles from a single image.
Our approach combines the strengths of deep learning and the elegance of traditional techniques.
We have integrated these algorithms with an autonomous driving system.
arXiv Detail & Related papers (2020-07-16T05:02:45Z) - VehicleNet: Learning Robust Visual Representation for Vehicle
Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets.
We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet.
We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.