Related papers: Towards an Integrated Platform for Big Data Analysis

Towards an Integrated Platform for Big Data Analysis

URL: http://arxiv.org/abs/2004.13021v1
Date: Mon, 27 Apr 2020 03:15:23 GMT
Title: Towards an Integrated Platform for Big Data Analysis
Authors: Mahdi Bohlouli, Frank Schulz, Lefteris Angelis, David Pahor, Ivona Brandic, David Atlan, Rosemary Tate
Abstract summary: This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
Score: 4.5257812998381315
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required. Innovative approaches to these challenges have been developed during recent years, and continue to be a hot topic for re-search and industry in the future. An investigation of current approaches reveals that usually only one or two aspects are ad-dressed, either in the data management, processing, analysis or visualization. This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects. Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, a more efficient usage of system resources, and an improved usability during the end-to-end data analysis process.

Related papers

CoDA: Agentic Systems for Collaborative Data Visualization [57.270599188947294]
Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations.<n>Existing approaches, including simple single- or multi-agent systems, often oversimplify the task.<n>We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection.
arXiv Detail & Related papers (2025-10-03T17:30:16Z)
Scaling Generalist Data-Analytic Agents [95.05161133349242]
DataMind is a scalable data synthesis and agent training recipe designed to build generalist data-analytic agents.<n>DataMind tackles three key challenges in building open-source data-analytic agents.
arXiv Detail & Related papers (2025-09-29T17:23:08Z)
A Comprehensive Survey on Imbalanced Data Learning [56.65067795190842]
imbalanced data is prevalent in various types of raw data and hinders the performance of machine learning.<n>This survey systematically analyzes various real-world data formats.<n>It concludes existing researches for different data formats into four categories: data re-balancing, feature representation, training strategy, and ensemble learning.
arXiv Detail & Related papers (2025-02-13T04:53:17Z)
Making Sense of Data in the Wild: Data Analysis Automation at Scale [0.1747623282473278]
We propose a novel approach that combines intelligent agents with retrieval augmented generation to automate data analysis, dataset curation and indexing at scale. We demonstrate that our approach results in more detailed dataset descriptions, higher hit rates and greater diversity in dataset retrieval tasks.
arXiv Detail & Related papers (2025-01-27T10:04:10Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
Deep Learning, Machine Learning, Advancing Big Data Analytics and Management [26.911181864764117]
Advances in artificial intelligence, machine learning, and deep learning have catalyzed the transformation of big data analytics and management. This work explores the theoretical foundations, methodological advancements, and practical implementations of these technologies. It equips researchers, practitioners, and data enthusiasts with the tools to navigate the complexities of modern data analytics.
arXiv Detail & Related papers (2024-12-03T05:59:34Z)
A Survey on Data Synthesis and Augmentation for Large Language Models [35.59526251210408]
This paper reviews and summarizes data generation techniques throughout the lifecycle of Large Language Models. We discuss the current constraints faced by these methods and investigate potential pathways for future development and research.
arXiv Detail & Related papers (2024-10-16T16:12:39Z)
Big data searching using words [0.0]
We introduce some fundamental ideas related to the neighborhood structure of words in data searching. We also introduce big data primal in big data searching and discuss the application of neighborhood structures in detecting anomalies in data searching.
arXiv Detail & Related papers (2024-09-10T13:46:14Z)
Data Issues in Industrial AI System: A Meta-Review and Research Strategy [10.540603300770885]
Artificial intelligence (AI) is assuming an increasingly pivotal role within industrial systems. Despite the recent trend within various industries to adopt AI, the actual adoption of AI is not as developed as perceived. How to address these data issues stands as a significant concern confronting both industry and academia.
arXiv Detail & Related papers (2024-06-22T08:36:59Z)
Benchmarking Data Science Agents [11.582116078653968]
Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. We introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents.
arXiv Detail & Related papers (2024-02-27T03:03:06Z)
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z)
Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process. InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining. In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z)
A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power. Deep learning technology has developed unprecedentedly in the last decade. This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z)
Advanced Data Augmentation Approaches: A Comprehensive Survey and Future directions [57.30984060215482]
We provide a background of data augmentation, a novel and comprehensive taxonomy of reviewed data augmentation techniques, and the strengths and weaknesses (wherever possible) of each technique. We also provide comprehensive results of the data augmentation effect on three popular computer vision tasks, such as image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2023-01-07T11:37:32Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Occams Razor for Big Data? On Detecting Quality in Large Unstructured Datasets [0.0]
New trend towards analytic complexity represents a severe challenge for the principle of parsimony or Occams Razor in science. Computational building block approaches for data clustering can help to deal with large unstructured datasets in minimized computation time. The review concludes on how cultural differences between East and West are likely to affect the course of big data analytics.
arXiv Detail & Related papers (2020-11-12T16:06:01Z)
Data Mining with Big Data in Intrusion Detection Systems: A Systematic Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation. The rapid rate and volume of data creation has begun to pose significant challenges for data management and security. The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.