Triclustering in Big Data Setting
- URL: http://arxiv.org/abs/2010.12933v1
- Date: Sat, 24 Oct 2020 16:55:55 GMT
- Title: Triclustering in Big Data Setting
- Authors: Dmitry Egurnov, Dmitry I. Ignatov, and Dmitry Tochilkin
- Abstract summary: We describe versions of triclustering algorithms adapted for efficient calculations in distributed environments with MapReduce model or parallelisation mechanism provided by modern programming languages.
OAC-family of triclustering algorithms shows good parallelisation capabilities due to the independent processing of triples of a triadic formal context.
- Score: 2.752817022620644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe versions of triclustering algorithms adapted for
efficient calculations in distributed environments with MapReduce model or
parallelisation mechanism provided by modern programming languages. OAC-family
of triclustering algorithms shows good parallelisation capabilities due to the
independent processing of triples of a triadic formal context. We provide the
time and space complexity of the algorithms and justify their relevance. We
also compare performance gain from using a distributed system and scalability.
Related papers
- Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-means [0.3069335774032178]
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology.
The proposed method efficiently integrates parallel processing, sampling, and competitive optimization to create a scalable variant designed for big data applications.
arXiv Detail & Related papers (2024-03-27T17:05:03Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - Late Fusion Multi-view Clustering via Global and Local Alignment
Maximization [61.89218392703043]
Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance.
Most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering.
We propose late fusion MVC via alignment to address these issues.
arXiv Detail & Related papers (2022-08-02T01:49:31Z) - ExClus: Explainable Clustering on Low-dimensional Data Representations [9.496898312608307]
Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret.
We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable.
We propose a new method to compute an interpretable clustering automatically, where the explanation is in the original high-dimensional space and the clustering is coherent in the low-dimensional projection.
arXiv Detail & Related papers (2021-11-04T21:24:01Z) - A New Parallel Adaptive Clustering and its Application to Streaming Data [0.0]
This paper presents a parallel adaptive clustering (PAC) algorithm to automatically classify data while simultaneously choosing a suitable number of classes.
We develop regularized set mik-means to efficiently cluster the results from the parallel threads.
We provide theoretical analysis and numerical experiments to characterize the performance of the method.
arXiv Detail & Related papers (2021-04-06T17:18:56Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - Fuzzy clustering algorithms with distance metric learning and entropy
regularization [0.0]
This paper proposes fuzzy clustering algorithms based on Euclidean, City-block and Mahalanobis distances and entropy regularization.
Several experiments on synthetic and real datasets, including its application to noisy image texture segmentation, demonstrate the usefulness of these adaptive clustering methods.
arXiv Detail & Related papers (2021-02-18T18:19:04Z) - Temporal Parallelization of Inference in Hidden Markov Models [0.0]
This paper presents algorithms for parallelization of inference in hidden Markov models (HMMs)
We propose parallel backward-forward type of filtering and smoothing algorithm as well as parallel Viterbi-type maximum-a-posteriori (MAP)
We empirically compare the performance of the proposed methods to classical methods on a highly parallel processing unit (GPU)
arXiv Detail & Related papers (2021-02-10T21:26:09Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.