Related papers: Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes

Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes

URL: http://arxiv.org/abs/2305.11927v2
Date: Fri, 15 Mar 2024 18:23:16 GMT
Title: Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes
Authors: Hayeong Song, Gonzalo Ramos, Peter Bodik,
Abstract summary: We present two interactive visualizations in the context of Sprite, a system for creating Computer Vision (CV) models. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance.
Score: 1.76602679361245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating Computer Vision (CV) models remains a complex practice, despite their ubiquity. Access to data, the requirement for ML expertise, and model opacity are just a few points of complexity that limit the ability of end-users to build, inspect, and improve these models. Interactive ML perspectives have helped address some of these issues by considering a teacher in the loop where planning, teaching, and evaluating tasks take place. We present and evaluate two interactive visualizations in the context of Sprite, a system for creating CV classification and detection models for images originating from videos. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance, compared to a baseline condition where users used a query language. We found that users who had used the visualizations found more images across a wider set of potential types of model errors.

Related papers

The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights [26.85150689408895]
We show that existing multimodal mathematical models minimally leverage visual information. We attribute this to the dominance of textual information and answer options that inadvertently guide the model to correct answers. In testing leading models, their failure to detect subtle visual differences suggests limitations in current visual perception capabilities.
arXiv Detail & Related papers (2025-03-06T07:29:33Z)
Interactive Visual Assessment for Text-to-Image Generation Models [28.526897072724662]
We propose DyEval, a dynamic interactive visual assessment framework for generative models. DyEval features an intuitive visual interface that enables users to interactively explore and analyze model behaviors. Our framework provides valuable insights for improving generative models and has broad implications for advancing the reliability and capabilities of visual generation systems.
arXiv Detail & Related papers (2024-11-23T10:06:18Z)
Evaluation and Comparison of Visual Language Models for Transportation Engineering Problems [16.49637074299509]
We have explored state-of-the-art vision language models (VLM) for vision-based transportation engineering tasks. The image classification task involves congestion detection and crack identification, whereas, for object detection, helmet violations were identified. We have applied open-source models such as CLIP, BLIP, OWL-ViT, Llava-Next, and closed-source GPT-4o to evaluate the performance of these VLM models.
arXiv Detail & Related papers (2024-09-03T20:24:37Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference for image descriptions using unlabeled images. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z)
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning [61.21923643289266]
Chain of Manipulations is a mechanism that enables Vision-Language Models to solve problems step-by-step with evidence. After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) actively without involving external tools. Our trained model, textbfCogCoM, achieves state-of-the-art performance across 9 benchmarks from 4 categories.
arXiv Detail & Related papers (2024-02-06T18:43:48Z)
A Vision Check-up for Language Models [61.852026871772914]
We show how a preliminary visual representation learning system can be trained using models of text. Experiments on self-supervised visual representation learning highlight the potential to train vision models capable of making semantic assessments of natural images.
arXiv Detail & Related papers (2024-01-03T18:09:33Z)
Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z)
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification [84.05253637260743]
We propose a new framework, named Semantic-guided Visual Adapting (SgVA), to extend vision-language pre-trained models. SgVA produces discriminative task-specific visual features by comprehensively using a vision-specific contrastive loss, a cross-modal contrastive loss, and an implicit knowledge distillation. State-of-the-art results on 13 datasets demonstrate that the adapted visual features can well complement the cross-modal features to improve few-shot image classification.
arXiv Detail & Related papers (2022-11-28T14:58:15Z)
Interactive Visual Feature Search [8.255656003475268]
We introduce Visual Feature Search, a novel interactive visualization that is adaptable to any CNN. Our tool allows a user to highlight an image region and search for images from a given dataset with the most similar model features. We demonstrate how our tool elucidates different aspects of model behavior by performing experiments on a range of applications, such as in medical imaging and wildlife classification.
arXiv Detail & Related papers (2022-11-28T04:39:03Z)
Impact of Feedback Type on Explanatory Interactive Learning [4.039245878626345]
Explanatory Interactive Learning (XIL) collects user feedback on visual model explanations to implement a Human-in-the-Loop (HITL) based interactive learning scenario. We compare the effectiveness of two different user feedback types in image classification tasks. We show that identifying and annotating spurious image features that a model finds salient results in superior classification and explanation accuracy than user feedback that tells a model to focus on valid image features.
arXiv Detail & Related papers (2022-09-26T07:33:54Z)
Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples. We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model. Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z)
VinVL: Revisiting Visual Representations in Vision-Language Models [96.39332942534368]
We develop an improved object detection model to provide object-centric representations of images. New visual features significantly improve the performance across all vision language (VL) tasks. We will release the new object detection model to public.
arXiv Detail & Related papers (2021-01-02T23:35:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.