A Tour of Visualization Techniques for Computer Vision Datasets
- URL: http://arxiv.org/abs/2204.08601v1
- Date: Tue, 19 Apr 2022 01:04:28 GMT
- Title: A Tour of Visualization Techniques for Computer Vision Datasets
- Authors: Bilal Alsallakh, Pamela Bhattacharya, Vanessa Feng, Narine Kokhlikyan,
Orion Reblitz-Richardson, Rahul Rajan, David Yan
- Abstract summary: We survey a number of data visualization techniques for analyzing Computer Vision (CV) datasets.
These techniques help us understand properties and latent patterns in such data, by applying dataset-level analysis.
We present various examples of how such analysis helps predict the potential impact of the dataset properties on CV models.
- Score: 4.5916483318867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We survey a number of data visualization techniques for analyzing Computer
Vision (CV) datasets. These techniques help us understand properties and latent
patterns in such data, by applying dataset-level analysis. We present various
examples of how such analysis helps predict the potential impact of the dataset
properties on CV models and informs appropriate mitigation of their
shortcomings. Finally, we explore avenues for further visualization techniques
of different modalities of CV datasets as well as ones that are tailored to
support specific CV tasks and analysis needs.
Related papers
- Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects [7.982715506261976]
We contribute a mapping between data readiness aspects and visual analysis techniques suitable for different data types.
In addition to the mapping, we extend the data readiness concept to better take aspects of the task and solution into account.
We report on our experiences in using the presented visual analysis techniques to aid future artificial intelligence projects in raising the data readiness level.
arXiv Detail & Related papers (2024-09-05T09:57:14Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.
Large foundation models, such as large language models, have revolutionized various natural language processing tasks.
This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - SeeBel: Seeing is Believing [0.9790236766474201]
We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images.
Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights.
We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
arXiv Detail & Related papers (2023-12-18T05:11:00Z) - Instruction Tuning for Large Language Models: A Survey [52.86322823501338]
We make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications.
We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
arXiv Detail & Related papers (2023-08-21T15:35:16Z) - Measuring Data [79.89948814583805]
We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets.
Data measurements quantify different attributes of data along common dimensions that support comparison.
We conclude with a discussion of the many avenues of future work, the limitations of data measurements, and how to leverage these measurement approaches in research and practice.
arXiv Detail & Related papers (2022-12-09T22:10:46Z) - Addressing Bias in Visualization Recommenders by Identifying Trends in
Training Data: Improving VizML Through a Statistical Analysis of the Plotly
Community Feed [55.41644538483948]
Machine learning is a promising approach to visualization recommendation due to its high scalability and representational power.
Our research project aims to address training bias in machine learning visualization recommendation systems by identifying trends in the training data through statistical analysis.
arXiv Detail & Related papers (2022-03-09T18:36:46Z) - Exploring Data Pipelines through the Process Lens: a Reference Model
forComputer Vision [0.0]
We argue that we could further systematize our analysis of harms by examining CV data pipelines through a process-oriented lens.
As a step towards cultivating a process-oriented lens, we embarked on an empirical study of CV data pipelines.
arXiv Detail & Related papers (2021-07-05T07:15:57Z) - Visualization Techniques to Enhance Automated Event Extraction [0.0]
This case study seeks to identify potential triggers of state-led mass killings from news articles using NLP.
We demonstrate how visualizations can aid in each stage, from exploratory analysis of raw data, to machine learning training analysis, and finally post-inference validation.
arXiv Detail & Related papers (2021-06-11T19:24:54Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.