Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects
- URL: http://arxiv.org/abs/2409.03805v1
- Date: Thu, 5 Sep 2024 09:57:14 GMT
- Title: Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects
- Authors: Mattias Tiger, Daniel Jakobsson, Anders Ynnerman, Fredrik Heintz, Daniel Jönsson,
- Abstract summary: We contribute a mapping between data readiness aspects and visual analysis techniques suitable for different data types.
In addition to the mapping, we extend the data readiness concept to better take aspects of the task and solution into account.
We report on our experiences in using the presented visual analysis techniques to aid future artificial intelligence projects in raising the data readiness level.
- Score: 7.982715506261976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present experiences and lessons learned from increasing data readiness of heterogeneous data for artificial intelligence projects using visual analysis methods. Increasing the data readiness level involves understanding both the data as well as the context in which it is used, which are challenges well suitable to visual analysis. For this purpose, we contribute a mapping between data readiness aspects and visual analysis techniques suitable for different data types. We use the defined mapping to increase data readiness levels in use cases involving time-varying data, including numerical, categorical, and text. In addition to the mapping, we extend the data readiness concept to better take aspects of the task and solution into account and explicitly address distribution shifts during data collection time. We report on our experiences in using the presented visual analysis techniques to aid future artificial intelligence projects in raising the data readiness level.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - A Comprehensive Survey on Data Augmentation [55.355273602421384]
Data augmentation is a technique that generates high-quality artificial data by manipulating existing data samples.
Existing literature surveys only focus on a certain type of specific modality data.
We propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities.
arXiv Detail & Related papers (2024-05-15T11:58:08Z) - Navigating Dataset Documentations in AI: A Large-Scale Analysis of
Dataset Cards on Hugging Face [46.60562029098208]
We analyze all 7,433 dataset documentation on Hugging Face.
Our study offers a unique perspective on analyzing dataset documentation through large-scale data science analysis.
arXiv Detail & Related papers (2024-01-24T21:47:13Z) - Towards Explainable Artificial Intelligence (XAI): A Data Mining
Perspective [35.620874971064765]
This work takes a "data-centric" view, examining how data collection, processing, and analysis contribute to explainable AI (XAI)
We categorize existing work into three categories subject to their purposes: interpretations of deep models, influences of training data, and insights of domain knowledge.
Specifically, we distill XAI methodologies into data mining operations on training and testing data across modalities.
arXiv Detail & Related papers (2024-01-09T06:27:09Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - Demonstration of InsightPilot: An LLM-Empowered Automated Data
Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process.
InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining.
In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Visualization Techniques to Enhance Automated Event Extraction [0.0]
This case study seeks to identify potential triggers of state-led mass killings from news articles using NLP.
We demonstrate how visualizations can aid in each stage, from exploratory analysis of raw data, to machine learning training analysis, and finally post-inference validation.
arXiv Detail & Related papers (2021-06-11T19:24:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.