Wide-Area Data Analytics
- URL: http://arxiv.org/abs/2006.10188v1
- Date: Wed, 17 Jun 2020 22:44:33 GMT
- Title: Wide-Area Data Analytics
- Authors: Rachit Agarwal and Jen Rexford (workshop co-chairs) with contributions
from numerous workshop attendees
- Abstract summary: We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations.
The Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019.
This report summarizes the challenges discussed and the conclusions generated at the workshop.
- Score: 4.080171822768553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We increasingly live in a data-driven world, with diverse kinds of data
distributed across many locations. In some cases, the datasets are collected
from multiple locations, such as sensors (e.g., mobile phones and street
cameras) spread throughout a geographic region. The data may need to be
analyzed close to where they are produced, particularly when the applications
require low latency, high, low cost, user privacy, and regulatory constraints.
In other cases, large datasets are distributed across public clouds, private
clouds, or edge-cloud computing sites with more plentiful computation, storage,
bandwidth, and energy resources. Often, some portion of the analysis may take
place on the end-host or edge cloud (to respect user privacy and reduce the
volume of data) while relying on remote clouds to complete the analysis (to
leverage greater computation and storage resources).
Wide-area data analytics is any analysis of data that is generated by, or
stored at, geographically dispersed entities. Over the past few years, several
parts of the computer science research community have started to explore
effective ways to analyze data spread over multiple locations. In particular,
several areas of "systems" research - including databases, distributed systems,
computer networking, and security and privacy - have delved into these topics.
These research subcommunities often focus on different aspects of the problem,
consider different motivating applications and use cases, and design and
evaluate their solutions differently. To address these challenges the Computing
Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area
data analytics in October 2019. This report summarizes the challenges discussed
and the conclusions generated at the workshop.
Related papers
- Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning.
We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z) - A Survey on Differential Privacy for SpatioTemporal Data in Transportation Research [0.9790236766474202]
In transportation, we are seeing a surge in intemporal data collection.
Recent developments in differential privacy in the context of such data have led to research in applied privacy.
To address the need for such data in research and inference without exposing private information, significant work has been proposed.
arXiv Detail & Related papers (2024-07-18T03:19:29Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning.
However, the promising results achieved on current public datasets may not be applicable to practical scenarios.
We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z) - Towards Confidential Computing: A Secure Cloud Architecture for Big Data
Analytics and AI [0.0]
Cloud computing has become a viable solution for big data analytics and artificial intelligence.
Data security in certain fields such as biomedical research remains a major concern when moving to cloud.
arXiv Detail & Related papers (2023-05-28T16:08:44Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - A communication efficient distributed learning framework for smart
environments [0.4898659895355355]
This paper proposes a distributed learning framework to move data analytics closer to where data is generated.
Using distributed machine learning techniques, it is possible to drastically reduce the network overhead, while obtaining performance comparable to the cloud solution.
The analysis also shows when each distributed learning approach is preferable, based on the specific distribution of the data on the nodes.
arXiv Detail & Related papers (2021-09-27T13:44:34Z) - Opening practice: supporting Reproducibility and Critical spatial data
science [0.0]
This paper reflects on a number of trends towards a more open and reproducible approach to spatial data science.
In particular it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling.
It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering 'black boxes'
arXiv Detail & Related papers (2020-07-20T07:50:08Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - Towards an Integrated Platform for Big Data Analysis [4.5257812998381315]
This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects.
Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
arXiv Detail & Related papers (2020-04-27T03:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.