Distributed data analytics
- URL: http://arxiv.org/abs/2203.14088v1
- Date: Sat, 26 Mar 2022 14:10:51 GMT
- Title: Distributed data analytics
- Authors: Richard Mortier, Hamed Haddadi, Sandra Servia, Liang Wang
- Abstract summary: Recommendation systems are a key component of online service providers.
Financial industry has adopted ML to harness large volumes of data in areas such as fraud detection, risk-management, and compliance.
- Score: 8.415530878975751
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine Learning (ML) techniques have begun to dominate data analytics
applications and services. Recommendation systems are a key component of online
service providers. The financial industry has adopted ML to harness large
volumes of data in areas such as fraud detection, risk-management, and
compliance. Deep Learning is the technology behind voice-based personal
assistants, etc. Deployment of ML technologies onto cloud computing
infrastructures has benefited numerous aspects of our daily life. The
advertising and associated online industries in particular have fuelled a rapid
rise the in deployment of personal data collection and analytics tools.
Traditionally, behavioural analytics relies on collecting vast amounts of data
in centralised cloud infrastructure before using it to train machine learning
models that allow user behaviour and preferences to be inferred. A contrasting
approach, distributed data analytics, where code and models for training and
inference are distributed to the places where data is collected, has been
boosted by two recent, ongoing developments: increased processing power and
memory capacity available in user devices at the edge of the network, such as
smartphones and home assistants; and increased sensitivity to the highly
intrusive nature of many of these devices and services and the attendant
demands for improved privacy. Indeed, the potential for increased privacy is
not the only benefit of distributing data analytics to the edges of the
network: reducing the movement of large volumes of data can also improve energy
efficiency, helping to ameliorate the ever increasing carbon footprint of our
digital infrastructure, enabling much lower latency for service interactions
than is possible when services are cloud-hosted. These approaches often
introduce challenges in privacy, utility, and efficiency trade-offs, while
having to ensure fruitful user engagement.
Related papers
- Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Federated Learning-Empowered AI-Generated Content in Wireless Networks [58.48381827268331]
Federated learning (FL) can be leveraged to improve learning efficiency and achieve privacy protection for AIGC.
We present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content.
arXiv Detail & Related papers (2023-07-14T04:13:11Z) - Enabling Inter-organizational Analytics in Business Networks Through
Meta Machine Learning [0.0]
Fear of disclosing sensitive information as well as the sheer volume of the data that would need to be exchanged are key inhibitors for the creation of effective system-wide solutions.
We propose a meta machine learning method that deals with these obstacles to enable comprehensive analyses within a business network.
arXiv Detail & Related papers (2023-03-28T09:06:28Z) - Federated Learning for Autoencoder-based Condition Monitoring in the
Industrial Internet of Things [0.07646713951724012]
Condition monitoring and predictive maintenance methods are key pillars for an efficient and robust manufacturing production cycle in the Industrial Internet of Things.
The employment of machine learning models to detect and predict deteriorating behavior by analyzing a variety of data collected across several industrial environments shows promising results in recent works.
Although collaborating and sharing knowledge between industry sites yields large benefits, it is often prohibited due to data privacy issues.
We propose an Autoencoder-based Federated Learning method utilizing vibration sensor data from rotating machines, that allows for a distributed training on edge devices, located on-premise and close to the monitored machines.
arXiv Detail & Related papers (2022-11-14T18:40:50Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - Deep Reinforcement Learning Assisted Federated Learning Algorithm for
Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment.
How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue.
This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z) - Privacy-Preserving Serverless Edge Learning with Decentralized Small
Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models.
This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z) - A Review of Privacy-preserving Federated Learning for the
Internet-of-Things [3.3517146652431378]
This work reviews federated learning as an approach for performing machine learning on distributed data.
We aim to protect the privacy of user-generated data as well as reducing communication costs associated with data transfer.
We identify the strengths and weaknesses of different methods applied to federated learning.
arXiv Detail & Related papers (2020-04-24T15:27:23Z) - A Privacy-Preserving Distributed Architecture for
Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service.
It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.