SyntEO: Synthetic Dataset Generation for Earth Observation with Deep
Learning -- Demonstrated for Offshore Wind Farm Detection
- URL: http://arxiv.org/abs/2112.02829v1
- Date: Mon, 6 Dec 2021 07:33:34 GMT
- Title: SyntEO: Synthetic Dataset Generation for Earth Observation with Deep
Learning -- Demonstrated for Offshore Wind Farm Detection
- Authors: Thorsten Hoeser and Claudia Kuenzer
- Abstract summary: The proposed SyntEO approach enables Earth observation researchers to automatically generate large deep learning ready datasets.
SyntEO does this by including expert knowledge in the data generation process in a highly structured manner.
We demonstrate the SyntEO approach by predicting offshore wind farms in Sentinel-1 images on two of the worlds largest offshore wind energy production sites.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the emergence of deep learning in the last years, new opportunities
arose in Earth observation research. Nevertheless, they also brought with them
new challenges. The data-hungry training processes of deep learning models
demand large, resource expensive, annotated datasets and partly replaced
knowledge-driven approaches, so that model behaviour and the final prediction
process became a black box. The proposed SyntEO approach enables Earth
observation researchers to automatically generate large deep learning ready
datasets and thus free up otherwise occupied resources. SyntEO does this by
including expert knowledge in the data generation process in a highly
structured manner. In this way, fully controllable experiment environments are
set up, which support insights in the model training. Thus, SyntEO makes the
learning process approachable and model behaviour interpretable, an important
cornerstone for explainable machine learning. We demonstrate the SyntEO
approach by predicting offshore wind farms in Sentinel-1 images on two of the
worlds largest offshore wind energy production sites. The largest generated
dataset has 90,000 training examples. A basic convolutional neural network for
object detection, that is only trained on this synthetic data, confidently
detects offshore wind farms by minimising false detections in challenging
environments. In addition, four sequential datasets are generated,
demonstrating how the SyntEO approach can precisely define the dataset
structure and influence the training process. SyntEO is thus a hybrid approach
that creates an interface between expert knowledge and data-driven image
analysis.
Related papers
- Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis [0.0]
This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize Malicious Network Traffic.
Our approach transforms numerical data into text, re-framing data generation as a language modeling task.
Our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data.
arXiv Detail & Related papers (2024-11-04T09:51:10Z) - Behaviour Distillation [10.437472004180883]
We formalize behaviour distillation, a setting that aims to discover and condense information required for training an expert policy into a synthetic dataset.
We then introduce Hallucinating datasets with Evolution Strategies (HaDES), a method for behaviour distillation that can discover datasets of just four state-action pairs.
We show that these datasets generalize out of distribution to training policies with a wide range of architectures.
We also demonstrate application to a downstream task, namely training multi-task agents in a zero-shot fashion.
arXiv Detail & Related papers (2024-06-21T10:45:43Z) - High-Resolution Detection of Earth Structural Heterogeneities from Seismic Amplitudes using Convolutional Neural Networks with Attention layers [0.31457219084519]
We propose an efficient and cost-effective architecture for detecting seismic structural heterogeneities using Convolutional Neural Networks (CNNs) combined with Attention layers.
Our model has half the parameters compared to the state-of-the-art, and it outperforms previous methods in terms of Intersection over Union (IoU) by 0.6% and precision by 0.4%.
arXiv Detail & Related papers (2024-04-15T22:49:37Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - A spectrum of physics-informed Gaussian processes for regression in
engineering [0.0]
Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach.
This paper pursues the combination of machine learning technology and physics-based reasoning to enhance our ability to make predictive models with limited data.
arXiv Detail & Related papers (2023-09-19T14:39:03Z) - Methodology for generating synthetic labeled datasets for visual
container inspection [0.0]
In this paper we present an innovative methodology to generate a realistic, varied, balanced, and labelled dataset for visual inspection task of containers.
We prove that the generated synthetic labelled dataset allows to train a deep neural network that can be used in a real world scenario.
arXiv Detail & Related papers (2023-06-26T10:51:18Z) - Exploring the Effectiveness of Dataset Synthesis: An application of
Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection.
We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset.
Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Bridging the Gap to Real-World Object-Centric Learning [66.55867830853803]
We show that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way.
Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data.
arXiv Detail & Related papers (2022-09-29T15:24:47Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.