Related papers: Wide-Area Data Analytics

Wide-Area Data Analytics

URL: http://arxiv.org/abs/2006.10188v1
Date: Wed, 17 Jun 2020 22:44:33 GMT
Title: Wide-Area Data Analytics
Authors: Rachit Agarwal and Jen Rexford (workshop co-chairs) with contributions from numerous workshop attendees
Abstract summary: We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. The Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.
Score: 4.080171822768553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We increasingly live in a data-driven world, with diverse kinds of data distributed across many locations. In some cases, the datasets are collected from multiple locations, such as sensors (e.g., mobile phones and street cameras) spread throughout a geographic region. The data may need to be analyzed close to where they are produced, particularly when the applications require low latency, high, low cost, user privacy, and regulatory constraints. In other cases, large datasets are distributed across public clouds, private clouds, or edge-cloud computing sites with more plentiful computation, storage, bandwidth, and energy resources. Often, some portion of the analysis may take place on the end-host or edge cloud (to respect user privacy and reduce the volume of data) while relying on remote clouds to complete the analysis (to leverage greater computation and storage resources). Wide-area data analytics is any analysis of data that is generated by, or stored at, geographically dispersed entities. Over the past few years, several parts of the computer science research community have started to explore effective ways to analyze data spread over multiple locations. In particular, several areas of "systems" research - including databases, distributed systems, computer networking, and security and privacy - have delved into these topics. These research subcommunities often focus on different aspects of the problem, consider different motivating applications and use cases, and design and evaluate their solutions differently. To address these challenges the Computing Community Consortium (CCC) convened a 1.5-day workshop focused on wide-area data analytics in October 2019. This report summarizes the challenges discussed and the conclusions generated at the workshop.

Related papers

Harnessing Rich Multi-Modal Data for Spatial-Temporal Homophily-Embedded Graph Learning Across Domains and Localities [2.5065738436850835]
This research proposes a heterogeneous data pipeline that performs cross-domain data fusion.<n>We aim to address complex urban problems across multiple domains and localities by harnessing the rich information over 50 data sources.
arXiv Detail & Related papers (2025-12-11T23:51:54Z)
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning. We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z)
A Survey on Differential Privacy for SpatioTemporal Data in Transportation Research [0.9790236766474202]
In transportation, we are seeing a surge in intemporal data collection. Recent developments in differential privacy in the context of such data have led to research in applied privacy. To address the need for such data in research and inference without exposing private information, significant work has been proposed.
arXiv Detail & Related papers (2024-07-18T03:19:29Z)
A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues [28.096861605150075]
federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing the raw data. This survey aims to bridge the gap by first providing an overview of FA, elucidating key concepts, and discussing its relationship with similar concepts.
arXiv Detail & Related papers (2024-04-19T07:06:40Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z)
LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting [65.71129509623587]
Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning. However, the promising results achieved on current public datasets may not be applicable to practical scenarios. We introduce the LargeST benchmark dataset, which includes a total of 8,600 sensors in California with a 5-year time coverage.
arXiv Detail & Related papers (2023-06-14T05:48:36Z)
Towards Confidential Computing: A Secure Cloud Architecture for Big Data Analytics and AI [0.0]
Cloud computing has become a viable solution for big data analytics and artificial intelligence. Data security in certain fields such as biomedical research remains a major concern when moving to cloud.
arXiv Detail & Related papers (2023-05-28T16:08:44Z)
Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data. We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z)
Federated Learning for Big Data: A Survey on Opportunities, Applications, and Future Directions [18.95670953718066]
Federated Learning (FL) emerges as a sub-field of machine learning.<n>This paper reviews the potential of FL in big data acquisition, storage, big data analytics and further privacy preservation.<n>The potential of FL in big data applications, such as smart city, smart healthcare, smart transportation, smart grid, and social media are also explored.
arXiv Detail & Related papers (2021-10-08T14:36:43Z)
A communication efficient distributed learning framework for smart environments [0.4898659895355355]
This paper proposes a distributed learning framework to move data analytics closer to where data is generated. Using distributed machine learning techniques, it is possible to drastically reduce the network overhead, while obtaining performance comparable to the cloud solution. The analysis also shows when each distributed learning approach is preferable, based on the specific distribution of the data on the nodes.
arXiv Detail & Related papers (2021-09-27T13:44:34Z)
Opening practice: supporting Reproducibility and Critical spatial data science [0.0]
This paper reflects on a number of trends towards a more open and reproducible approach to spatial data science. In particular it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling. It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering 'black boxes'
arXiv Detail & Related papers (2020-07-20T07:50:08Z)
Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
Towards an Integrated Platform for Big Data Analysis [4.5257812998381315]
This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
arXiv Detail & Related papers (2020-04-27T03:15:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.