A Survey on Large-scale Machine Learning
- URL: http://arxiv.org/abs/2008.03911v1
- Date: Mon, 10 Aug 2020 06:07:52 GMT
- Title: A Survey on Large-scale Machine Learning
- Authors: Meng Wang, Weijie Fu, Xiangnan He, Shijie Hao, Xindong Wu
- Abstract summary: Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
- Score: 67.6997613600942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning can provide deep insights into data, allowing machines to
make high-quality predictions and having been widely used in real-world
applications, such as text mining, visual classification, and recommender
systems. However, most sophisticated machine learning approaches suffer from
huge time costs when operating on large-scale data. This issue calls for the
need of {Large-scale Machine Learning} (LML), which aims to learn patterns from
big data with comparable performance efficiently. In this paper, we offer a
systematic survey on existing LML methods to provide a blueprint for the future
developments of this area. We first divide these LML methods according to the
ways of improving the scalability: 1) model simplification on computational
complexities, 2) optimization approximation on computational efficiency, and 3)
computation parallelism on computational capabilities. Then we categorize the
methods in each perspective according to their targeted scenarios and introduce
representative methods in line with intrinsic strategies. Lastly, we analyze
their limitations and discuss potential directions as well as open issues that
are promising to address in the future.
Related papers
- Topological Methods in Machine Learning: A Tutorial for Practitioners [4.297070083645049]
Topological Machine Learning (TML) is an emerging field that leverages techniques from algebraic topology to analyze complex data structures.
This tutorial provides a comprehensive introduction to two key TML techniques, persistent homology and the Mapper algorithm.
To enhance accessibility, we adopt a data-centric approach, enabling readers to gain hands-on experience applying these techniques to relevant tasks.
arXiv Detail & Related papers (2024-09-04T17:44:52Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Towards Efficient Generative Large Language Model Serving: A Survey from
Algorithms to Systems [14.355768064425598]
generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data.
However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency.
This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective.
arXiv Detail & Related papers (2023-12-23T11:57:53Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - A Survey on the Integration of Machine Learning with Sampling-based
Motion Planning [9.264471872135623]
This survey reviews machine learning efforts to improve the computational efficiency and applicability of Sampling-Based Motion Planners (SBMPs)
It first discusses how learning has been used to enhance key components of SBMPs, such as node sampling, collision detection, distance or nearest neighbor, local planning, and termination conditions.
It also discusses how machine learning has been used to provide data-driven models of robots, which can then be used by a SBMP.
arXiv Detail & Related papers (2022-11-15T18:13:49Z) - Interpretable AI-based Large-scale 3D Pathloss Prediction Model for
enabling Emerging Self-Driving Networks [3.710841042000923]
We propose a Machine Learning-based model that leverages novel key predictors for estimating pathloss.
By quantitatively evaluating the ability of various ML algorithms in terms of predictive, generalization and computational performance, our results show that Light Gradient Boosting Machine (LightGBM) algorithm overall outperforms others.
arXiv Detail & Related papers (2022-01-30T19:50:16Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - MLPerfTM HPC: A Holistic Benchmark Suite for Scientific Machine Learning
on HPC Systems [32.621917787044396]
We introduceerf HPC, a benchmark suite of scientific machine learning training applications driven by the MLCommonsTM Association.
We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence, and compute performance.
We conclude by characterizing each benchmark with respect to low-level memory, I/O, and network behavior.
arXiv Detail & Related papers (2021-10-21T20:30:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.