SeeBel: Seeing is Believing
- URL: http://arxiv.org/abs/2312.10933v1
- Date: Mon, 18 Dec 2023 05:11:00 GMT
- Title: SeeBel: Seeing is Believing
- Authors: Sourajit Saha, Shubhashis Roy Dipta
- Abstract summary: We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images.
Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights.
We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
- Score: 0.9790236766474201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic Segmentation is a significant research field in Computer Vision.
Despite being a widely studied subject area, many visualization tools do not
exist that capture segmentation quality and dataset statistics such as a class
imbalance in the same view. While the significance of discovering and
introspecting the correlation between dataset statistics and AI model
performance for dense prediction computer vision tasks such as semantic
segmentation is well established in the computer vision literature, to the best
of our knowledge, no visualization tools have been proposed to view and analyze
the aforementioned tasks. Our project aims to bridge this gap by proposing
three visualizations that enable users to compare dataset statistics and AI
performance for segmenting all images, a single image in the dataset, explore
the AI model's attention on image regions once trained and browse the quality
of masks predicted by AI for any selected (by user) number of objects under the
same tool. Our project tries to further increase the interpretability of the
trained AI model for segmentation by visualizing its image attention weights.
For visualization, we use Scatterplot and Heatmap to encode correlation and
features, respectively. We further propose to conduct surveys on real users to
study the efficacy of our visualization tool in computer vision and AI domain.
The full system can be accessed at https://github.com/dipta007/SeeBel
Related papers
- AiSciVision: A Framework for Specializing Large Multimodal Models in Scientific Image Classification [2.4515373478215343]
We introduce AiSciVision, a framework that specializes Large Multimodal Models (LMMs) into interactive research partners.
Our framework uses two key components: Visual Retrieval-Augmented Generation (VisRAG) and domain-specific tools utilized in an agentic workflow.
We evaluate AiSciVision on three real-world scientific image classification datasets: detecting the presence of aquaculture ponds, eelgrass, and solar panels.
arXiv Detail & Related papers (2024-10-28T19:35:47Z) - PUB: Plot Understanding Benchmark and Dataset for Evaluating Large Language Models on Synthetic Visual Data Interpretation [2.1184929769291294]
This paper presents a novel synthetic dataset designed to evaluate the proficiency of large language models in interpreting data visualizations.
Our dataset is generated using controlled parameters to ensure comprehensive coverage of potential real-world scenarios.
We employ multimodal text prompts with questions related to visual data in images to benchmark several state-of-the-art models.
arXiv Detail & Related papers (2024-09-04T11:19:17Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Pushing Boundaries: Exploring Zero Shot Object Classification with Large
Multimodal Models [0.09264362806173355]
Large Language and Vision Assistant models (LLVAs) engage users in rich conversational experiences intertwined with image-based queries.
This paper takes a unique perspective on LMMs, exploring their efficacy in performing image classification tasks using tailored prompts.
Our study includes a benchmarking analysis across four diverse datasets: MNIST, Cats Vs. Dogs, Hymnoptera (Ants Vs. Bees), and an unconventional dataset comprising Pox Vs. Non-Pox skin images.
arXiv Detail & Related papers (2023-12-30T03:19:54Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.