DIAL: Deep Interactive and Active Learning for Semantic Segmentation in
Remote Sensing
- URL: http://arxiv.org/abs/2201.01047v1
- Date: Tue, 4 Jan 2022 09:11:58 GMT
- Title: DIAL: Deep Interactive and Active Learning for Semantic Segmentation in
Remote Sensing
- Authors: Gaston Lenczner, Adrien Chan-Hon-Tong, Bertrand Le Saux, Nicola
Luminari, Guy Le Besnerais
- Abstract summary: We propose to build up a collaboration between a deep neural network and a human in the loop.
In a nutshell, the agent iteratively interacts with the network to correct its initially flawed predictions.
We show that active learning based on uncertainty estimation enables to quickly lead the user towards mistakes.
- Score: 34.209686918341475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose in this article to build up a collaboration between a deep neural
network and a human in the loop to swiftly obtain accurate segmentation maps of
remote sensing images. In a nutshell, the agent iteratively interacts with the
network to correct its initially flawed predictions. Concretely, these
interactions are annotations representing the semantic labels. Our
methodological contribution is twofold. First, we propose two interactive
learning schemes to integrate user inputs into deep neural networks. The first
one concatenates the annotations with the other network's inputs. The second
one uses the annotations as a sparse ground-truth to retrain the network.
Second, we propose an active learning strategy to guide the user towards the
most relevant areas to annotate. To this purpose, we compare different
state-of-the-art acquisition functions to evaluate the neural network
uncertainty such as ConfidNet, entropy or ODIN. Through experiments on three
remote sensing datasets, we show the effectiveness of the proposed methods.
Notably, we show that active learning based on uncertainty estimation enables
to quickly lead the user towards mistakes and that it is thus relevant to guide
the user interventions.
Related papers
- Identifying Sub-networks in Neural Networks via Functionally Similar Representations [41.028797971427124]
We take a step toward automating the understanding of the network by investigating the existence of distinct sub-networks.
Our approach offers meaningful insights into the behavior of neural networks with minimal human and computational cost.
arXiv Detail & Related papers (2024-10-21T20:19:00Z) - Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks [19.639533220155965]
This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection.
We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic.
arXiv Detail & Related papers (2023-11-18T02:44:33Z) - DeepSI: Interactive Deep Learning for Semantic Interaction [5.188825486231326]
We propose a framework that integrates deep learning into the human-in-the-loop interactive sensemaking pipeline.
Deep learning extracts meaningful representations from raw data, which improves semantic interaction inference.
Semantic interactions are exploited to fine-tune the deep learning representations, which then improves semantic interaction inference.
arXiv Detail & Related papers (2023-05-26T18:05:57Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Introspective Learning : A Two-Stage Approach for Inference in Neural
Networks [18.32369721322249]
We advocate for two stages in a neural network's decision making process.
The first is the existing feed-forward inference framework where patterns in given data are sensed and associated with previously learned patterns.
The second is a slower reflection stage where we ask the network to reflect on its feed-forward decision by considering and evaluating all available choices.
arXiv Detail & Related papers (2022-09-17T23:31:03Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z) - Bridging the gap between Human Action Recognition and Online Action
Detection [0.456877715768796]
Action recognition, early prediction, and online action detection are complementary disciplines that are often studied independently.
We address the task-specific feature extraction with a teacher-student framework between the aforementioned disciplines.
Our network embeds online early prediction and online temporal segment proposalworks in parallel.
arXiv Detail & Related papers (2021-01-21T21:01:46Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - BiDet: An Efficient Binarized Object Detector [96.19708396510894]
We propose a binarized neural network learning method called BiDet for efficient object detection.
Our BiDet fully utilizes the representational capacity of the binary neural networks for object detection by redundancy removal.
Our method outperforms the state-of-the-art binary neural networks by a sizable margin.
arXiv Detail & Related papers (2020-03-09T08:16:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.