Visual Analytics For Machine Learning: A Data Perspective Survey
- URL: http://arxiv.org/abs/2307.07712v1
- Date: Sat, 15 Jul 2023 05:13:06 GMT
- Title: Visual Analytics For Machine Learning: A Data Perspective Survey
- Authors: Junpeng Wang, Shixia Liu, Wei Zhang
- Abstract summary: This survey focuses on summarizing VIS4ML works from the data perspective.
We categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them.
Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data at different stages of the ML pipeline to understand, diagnose, and refine ML models.
- Score: 17.19676876329529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The past decade has witnessed a plethora of works that leverage the power of
visualization (VIS) to interpret machine learning (ML) models. The
corresponding research topic, VIS4ML, keeps growing at a fast pace. To better
organize the enormous works and shed light on the developing trend of VIS4ML,
we provide a systematic review of these works through this survey. Since data
quality greatly impacts the performance of ML models, our survey focuses
specifically on summarizing VIS4ML works from the data perspective. First, we
categorize the common data handled by ML models into five types, explain the
unique features of each type, and highlight the corresponding ML models that
are good at learning from them. Second, from the large number of VIS4ML works,
we tease out six tasks that operate on these types of data (i.e., data-centric
tasks) at different stages of the ML pipeline to understand, diagnose, and
refine ML models. Lastly, by studying the distribution of 143 surveyed papers
across the five data types, six data-centric tasks, and their intersections, we
analyze the prospective research directions and envision future research
trends.
Related papers
- Empirical Insights on Fine-Tuning Large Language Models for Question-Answering [50.12622877002846]
Large language models (LLMs) encode extensive world knowledge through pre-training on massive datasets, which can be fine-tuned for the question-answering (QA) task.
We categorize supervised fine-tuning (SFT) data based on the extent of knowledge memorized by the pretrained LLMs.
Our experiments show that as few as 60 data points during the SFT stage can activate the knowledge encoded during pre-training, enabling LLMs to perform the QA task.
arXiv Detail & Related papers (2024-09-24T07:38:38Z) - The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective [53.48484062444108]
We find that the development of models and data is not two separate paths but rather interconnected.
On the one hand, vaster and higher-quality data contribute to better performance of MLLMs; on the other hand, MLLMs can facilitate the development of data.
To promote the data-model co-development for MLLM community, we systematically review existing works related to MLLMs from the data-model co-development perspective.
arXiv Detail & Related papers (2024-07-11T15:08:11Z) - A Survey of Multimodal Large Language Model from A Data-centric Perspective [46.57232264950785]
Multimodal large language models (MLLMs) enhance the capabilities of standard large language models by integrating and processing data from multiple modalities.
Data plays a pivotal role in the development and refinement of these models.
arXiv Detail & Related papers (2024-05-26T17:31:21Z) - MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [86.61052121715689]
MatPlotAgent is a model-agnostic framework designed to automate scientific data visualization tasks.
MatPlotBench is a high-quality benchmark consisting of 100 human-verified test cases.
arXiv Detail & Related papers (2024-02-18T04:28:28Z) - Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator [63.762209407570715]
Genixer is a comprehensive data generation pipeline consisting of four key steps.
A synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks.
MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data.
arXiv Detail & Related papers (2023-12-11T09:44:41Z) - Large Language Models as Data Preprocessors [9.99065004972981]
Large Language Models (LLMs) have marked a significant advancement in artificial intelligence.
This study explores their potential in data preprocessing, a critical stage in data mining and analytics applications.
We propose an LLM-based framework for data preprocessing, which integrates cutting-edge prompt engineering techniques.
arXiv Detail & Related papers (2023-08-30T23:28:43Z) - Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML
Research [26.829392755701843]
We survey recent VIS4ML papers to assess the generalizability of research contributions and claims in enabling human-in-the-loop ML.
Our results show potential gaps between the current scope of VIS4ML research and aspirations for its use in practice.
arXiv Detail & Related papers (2023-08-10T21:44:48Z) - Measuring Progress in Fine-grained Vision-and-Language Understanding [23.377634283746698]
We investigate four competitive vision-and-language models on fine-grained benchmarks.
We find that X-VLM consistently outperforms other baselines.
We highlight the importance of both novel losses and rich data sources for learning fine-grained skills.
arXiv Detail & Related papers (2023-05-12T15:34:20Z) - Vision-Language Models for Vision Tasks: A Survey [62.543250338410836]
Vision-Language Models (VLMs) learn rich vision-language correlation from web-scale image-text pairs.
This paper provides a systematic review of visual language models for various visual recognition tasks.
arXiv Detail & Related papers (2023-04-03T02:17:05Z) - Towards Perspective-Based Specification of Machine Learning-Enabled
Systems [1.3406258114080236]
This paper describes our work towards a perspective-based approach for specifying ML-enabled systems.
The approach involves analyzing a set of 45 ML concerns grouped into five perspectives: objectives, user experience, infrastructure, model, and data.
The main contribution of this paper is to provide two new artifacts that can be used to help specifying ML-enabled systems.
arXiv Detail & Related papers (2022-06-20T13:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.