Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes
- URL: http://arxiv.org/abs/2305.11927v2
- Date: Fri, 15 Mar 2024 18:23:16 GMT
- Title: Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes
- Authors: Hayeong Song, Gonzalo Ramos, Peter Bodik,
- Abstract summary: We present two interactive visualizations in the context of Sprite, a system for creating Computer Vision (CV) models.
We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance.
- Score: 1.76602679361245
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating Computer Vision (CV) models remains a complex practice, despite their ubiquity. Access to data, the requirement for ML expertise, and model opacity are just a few points of complexity that limit the ability of end-users to build, inspect, and improve these models. Interactive ML perspectives have helped address some of these issues by considering a teacher in the loop where planning, teaching, and evaluating tasks take place. We present and evaluate two interactive visualizations in the context of Sprite, a system for creating CV classification and detection models for images originating from videos. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance, compared to a baseline condition where users used a query language. We found that users who had used the visualizations found more images across a wider set of potential types of model errors.
Related papers
- Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems [16.49637074299509]
We have explored state-of-the-art vision language models (VLM) for vision-based transportation engineering tasks.
The image classification task involves congestion detection and crack identification, whereas, for object detection, helmet violations were identified.
We have applied open-source models such as CLIP, BLIP, OWL-ViT, Llava-Next, and closed-source GPT-4o to evaluate the performance of these VLM models.
arXiv Detail & Related papers (2024-09-03T20:24:37Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [99.9389737339175]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - A Vision Check-up for Language Models [61.852026871772914]
We show how a preliminary visual representation learning system can be trained using models of text.
Experiments on self-supervised visual representation learning highlight the potential to train vision models capable of making semantic assessments of natural images.
arXiv Detail & Related papers (2024-01-03T18:09:33Z) - Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z) - SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models.
SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation.
State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z) - Interactive Visual Feature Search [8.255656003475268]
We introduce Visual Feature Search, a novel interactive visualization that is adaptable to any CNN.
Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar model features.
We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on a range of applications, such as in medical imaging and wildlife classification.
arXiv Detail & Related papers (2022-11-28T04:39:03Z) - Impact of Feedback Type on Explanatory Interactive Learning [4.039245878626345]
Explanatory Interactive Learning (XIL) collects user feedback on visual model explanations to implement a Human-in-the-Loop (HITL) based interactive learning scenario.
We compare the effectiveness of two different user feedback types in image classification tasks.
We show that identifying and annotating spurious image features that a model finds salient results in superior classification and explanation accuracy than user feedback that tells a model to focus on valid image features.
arXiv Detail & Related papers (2022-09-26T07:33:54Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - Intuitively Assessing ML Model Reliability through Example-Based
Explanations and Editing Model Inputs [19.09848738521126]
Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models.
We present two interface modules to facilitate a more intuitive assessment of model reliability.
arXiv Detail & Related papers (2021-02-17T02:41:32Z) - VinVL: Revisiting Visual Representations in Vision-Language Models [96.39332942534368]
We develop an improved object detection model to provide object-centric representations of images.
New visual features significantly improve the performance across all vision language (VL) tasks.
We will release the new object detection model to public.
arXiv Detail & Related papers (2021-01-02T23:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.