Mathematical Computation on High-dimensional Data via Array Programming and Parallel Acceleration
- URL: http://arxiv.org/abs/2506.22929v1
- Date: Sat, 28 Jun 2025 15:42:23 GMT
- Title: Mathematical Computation on High-dimensional Data via Array Programming and Parallel Acceleration
- Authors: Chen Zhang,
- Abstract summary: We propose a parallel computation architecture based on space completeness, decomposing high-dimensional data into dimension-independent structures for distributed processing.<n>This framework enables seamless integration of data mining and parallel-optimized machine learning methods, supporting scientific computations across diverse data types like medical and natural images within a unified system.
- Score: 6.920979776722456
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While deep learning excels in natural image and language processing, its application to high-dimensional data faces computational challenges due to the dimensionality curse. Current large-scale data tools focus on business-oriented descriptive statistics, lacking mathematical statistics support for advanced analysis. We propose a parallel computation architecture based on space completeness, decomposing high-dimensional data into dimension-independent structures for distributed processing. This framework enables seamless integration of data mining and parallel-optimized machine learning methods, supporting scientific computations across diverse data types like medical and natural images within a unified system.
Related papers
- Mathematical artificial data for operator learning [1.4579344926652846]
We present the Mathematical Artificial Data (MAD) framework, a new paradigm that integrates physical laws with data-driven learning to facilitate large-scale operator discovery.<n>We show MAD's generalizability and superior efficiency/accuracy across various differential equations scenarios.
arXiv Detail & Related papers (2025-07-09T11:23:05Z) - RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach.<n>It generates graphs as solutions by combining Python-formatted functions from this library.<n>Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z) - An Incremental Non-Linear Manifold Approximation Method [0.0]
This research develops an incremental non-linear dimension reduction method using the Geometric Multi-Resolution Analysis (GMRA) framework for streaming data.<n>The proposed method enables real-time data analysis and visualization by incrementally updating the cluster map, basis PCA vectors, and wavelet coefficients.
arXiv Detail & Related papers (2025-04-12T03:54:05Z) - A Novel Approach for Intrinsic Dimension Estimation [0.0]
The real-life data have a complex and non-linear structure due to their nature.<n>Finding the nearly optimal representation of the dataset in a lower-dimensional space offers an applicable mechanism for improving the success of machine learning tasks.<n>We propose a highly efficient and robust intrinsic dimension estimation approach.
arXiv Detail & Related papers (2025-03-12T15:42:39Z) - Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z) - Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning.
We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z) - Computing with Residue Numbers in High-Dimensional Representation [7.736925756277564]
We introduce Residue Hyperdimensional Computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors.
We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements.
arXiv Detail & Related papers (2023-11-08T18:19:45Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Understanding High Dimensional Spaces through Visual Means Employing
Multidimensional Projections [0.0]
Two of the relevant algorithms in the data visualisation field are t-distributed neighbourhood embedding (t-SNE) and Least-Square Projection (LSP)
These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets.
We illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework.
arXiv Detail & Related papers (2022-07-12T20:30:33Z) - Semi-Parametric Inducing Point Networks and Neural Processes [15.948270454686197]
Semi-parametric inducing point networks (SPIN) can query the training set at inference time in a compute-efficient manner.
SPIN attains linear complexity via a cross-attention mechanism between datapoints inspired by inducing point methods.
In our experiments, SPIN reduces memory requirements, improves accuracy across a range of meta-learning tasks, and improves state-of-the-art performance on an important practical problem, genotype imputation.
arXiv Detail & Related papers (2022-05-24T01:42:46Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.