ESTemd: A Distributed Processing Framework for Environmental Monitoring
based on Apache Kafka Streaming Engine
- URL: http://arxiv.org/abs/2104.01082v1
- Date: Fri, 2 Apr 2021 15:04:15 GMT
- Title: ESTemd: A Distributed Processing Framework for Environmental Monitoring
based on Apache Kafka Streaming Engine
- Authors: Adeyinka Akanbi
- Abstract summary: Distributed networks and real-time systems are becoming the most important components for the new computer age, the Internet of Things.
Data generated offers the ability to measure, infer and understand environmental indicators, from delicate ecologies to natural resources to urban environments.
We propose a distributed framework Event STream Processing Engine for Environmental Monitoring Domain (ESTemd) for the application of stream processing on heterogeneous environmental data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributed networks and real-time systems are becoming the most important
components for the new computer age, the Internet of Things (IoT), with huge
data streams or data sets generated from sensors and data generated from
existing legacy systems. The data generated offers the ability to measure,
infer and understand environmental indicators, from delicate ecologies and
natural resources to urban environments. This can be achieved through the
analysis of the heterogeneous data sources (structured and unstructured). In
this paper, we propose a distributed framework Event STream Processing Engine
for Environmental Monitoring Domain (ESTemd) for the application of stream
processing on heterogeneous environmental data. Our work in this area
demonstrates the useful role big data techniques can play in an environmental
decision support system, early warning and forecasting systems. The proposed
framework addresses the challenges of data heterogeneity from heterogeneous
systems and real time processing of huge environmental datasets through a
publish/subscribe method via a unified data pipeline with the application of
Apache Kafka for real time analytics.
Related papers
- Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Digital Ecosystem for FAIR Time Series Data Management in Environmental System Science [0.0]
This paper introduces a versatile and transferable digital ecosystem for managing time series data.
The system is highly adaptable, cloud-ready, and suitable for deployment in a wide range of settings.
arXiv Detail & Related papers (2024-09-05T08:53:23Z) - Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought [0.0]
Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe.
This research develops a semantics-based data integration that encompasses and integrates data models of local indigenous knowledge and sensor data.
The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference.
arXiv Detail & Related papers (2024-05-17T11:44:22Z) - Rule based Complex Event Processing for an Air Quality Monitoring System in Smart City [0.929965561686354]
The research work proposes an integrated framework for monitoring air quality using rule-based Complex Event Processing (CEP) and SPARQL queries.
The dataset was collected from the Central Pollution Control Board (CPCB) of India and this data was then preprocessed and passed through Apache Kafka.
Consequently, convert preprocessed data into Resource Description Framework (RDF) data, and integrate with Knowledge graph which is ingested to CEP engine.
arXiv Detail & Related papers (2024-03-16T10:35:34Z) - SpaCE: The Spatial Confounding Environment [2.572906392867547]
SpaCE provides realistic benchmark datasets and tools for evaluating causal inference methods.
Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores.
SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models.
arXiv Detail & Related papers (2023-12-01T16:42:57Z) - OEBench: Investigating Open Environment Challenges in Real-World
Relational Data Streams [32.898349646434326]
We develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in real-world relational data streams.
We find that increased data quantity may not consistently enhance the model accuracy when applied in open environment scenarios.
arXiv Detail & Related papers (2023-08-29T06:43:29Z) - Multimodal Dataset from Harsh Sub-Terranean Environment with Aerosol
Particles for Frontier Exploration [55.41644538483948]
This paper introduces a multimodal dataset from the harsh and unstructured underground environment with aerosol particles.
It contains synchronized raw data measurements from all onboard sensors in Robot Operating System (ROS) format.
The focus of this paper is not only to capture both temporal and spatial data diversities but also to present the impact of harsh conditions on captured data.
arXiv Detail & Related papers (2023-04-27T20:21:18Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Optical flow-based branch segmentation for complex orchard environments [73.11023209243326]
We train a neural network system in simulation only using simulated RGB data and optical flow.
This resulting neural network is able to perform foreground segmentation of branches in a busy orchard environment without additional real-world training or using any special setup or equipment beyond a standard camera.
Our results show that our system is highly accurate and, when compared to a network using manually labeled RGBD data, achieves significantly more consistent and robust performance across environments that differ from the training set.
arXiv Detail & Related papers (2022-02-26T03:38:20Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.