VLSlice: Interactive Vision-and-Language Slice Discovery
- URL: http://arxiv.org/abs/2309.06703v1
- Date: Wed, 13 Sep 2023 04:02:38 GMT
- Title: VLSlice: Interactive Vision-and-Language Slice Discovery
- Authors: Eric Slyman, Minsuk Kahng, Stefan Lee
- Abstract summary: VLSlice is an interactive system enabling user-guided discovery of coherent representation-level subgroups with consistent visiolinguistic behavior.
We show that VLSlice enables users to quickly generate diverse high-coherency slices in a user study and release the tool publicly.
- Score: 17.8634551024147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in vision-and-language demonstrates that large-scale pretraining
can learn generalizable models that are efficiently transferable to downstream
tasks. While this may improve dataset-scale aggregate metrics, analyzing
performance around hand-crafted subgroups targeting specific bias dimensions
reveals systemic undesirable behaviors. However, this subgroup analysis is
frequently stalled by annotation efforts, which require extensive time and
resources to collect the necessary data. Prior art attempts to automatically
discover subgroups to circumvent these constraints but typically leverages
model behavior on existing task-specific annotations and rapidly degrades on
more complex inputs beyond "tabular" data, none of which study
vision-and-language models. This paper presents VLSlice, an interactive system
enabling user-guided discovery of coherent representation-level subgroups with
consistent visiolinguistic behavior, denoted as vision-and-language slices,
from unlabeled image sets. We show that VLSlice enables users to quickly
generate diverse high-coherency slices in a user study (n=22) and release the
tool publicly.
Related papers
- VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection [5.66050466694651]
We propose Vision-Language (VL) encoders into existing anomaly detectors to leverage the semantically broad VL pre-training for improved outlier awareness.
We also propose a new scoring function that enables data- and training-free outlier supervision via textual prompts.
The resulting VL4AD model achieves competitive performance on widely used benchmark datasets.
arXiv Detail & Related papers (2024-09-25T20:12:10Z) - Interactive dense pixel visualizations for time series and model attribution explanations [8.24039921933289]
DAVOTS is an interactive visual analytics approach to explore raw time series data, activations of neural networks, and attributions in a dense-pixel visualization.
We apply clustering approaches to the visualized data domains to highlight groups and present ordering strategies for individual and combined data exploration.
arXiv Detail & Related papers (2024-08-27T14:02:21Z) - Multi-modal Auto-regressive Modeling via Visual Words [96.25078866446053]
We propose the concept of visual tokens, which maps the visual features to probability distributions over Large Multi-modal Models' vocabulary.
We further explore the distribution of visual features in the semantic space within LMM and the possibility of using text embeddings to represent visual information.
arXiv Detail & Related papers (2024-03-12T14:58:52Z) - Learning Representations without Compositional Assumptions [79.12273403390311]
We propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.
We also introduce LEGATO, a novel hierarchical graph autoencoder that learns a smaller, latent graph to aggregate information from multiple views dynamically.
arXiv Detail & Related papers (2023-05-31T10:36:10Z) - Visualizing Linguistic Diversity of Text Datasets Synthesized by Large
Language Models [9.808214545408541]
LinguisticLens is a novel inter-active visualization tool for making sense of and analyzing syntactic diversity of datasets.
It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples.
arXiv Detail & Related papers (2023-05-19T00:53:45Z) - Diagnosing and Rectifying Vision Models using Language [31.588965563961573]
Recent contrastive learning models have demonstrated the ability to learn an embedding space suitable for building strong vision classifiers.
Our work highlights a distinct advantage of this multi-modal embedding space: the ability to diagnose vision classifiers through natural language.
Our proposed method can discover high-error data slices, identify influential attributes and further rectify undesirable model behaviors.
arXiv Detail & Related papers (2023-02-08T18:59:42Z) - Visual Auditor: Interactive Visualization for Detection and
Summarization of Model Biases [18.434430375939755]
As machine learning (ML) systems become increasingly widespread, it is necessary to audit these systems for biases prior to their deployment.
Recent research has developed algorithms for effectively identifying intersectional bias in the form of interpretable, underperforming subsets (or slices) of the data.
We propose Visual Auditor, an interactive visualization tool for auditing and summarizing model biases.
arXiv Detail & Related papers (2022-06-25T02:48:27Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Learning Dual Dynamic Representations on Time-Sliced User-Item
Interaction Graphs for Sequential Recommendation [62.30552176649873]
We devise a novel Dynamic Representation Learning model for Sequential Recommendation (DRL-SRe)
To better model the user-item interactions for characterizing the dynamics from both sides, the proposed model builds a global user-item interaction graph for each time slice.
To enable the model to capture fine-grained temporal information, we propose an auxiliary temporal prediction task over consecutive time slices.
arXiv Detail & Related papers (2021-09-24T07:44:27Z) - Exploit Clues from Views: Self-Supervised and Regularized Learning for
Multiview Object Recognition [66.87417785210772]
This work investigates the problem of multiview self-supervised learning (MV-SSL)
A novel surrogate task for self-supervised learning is proposed by pursuing "object invariant" representation.
Experiments shows that the recognition and retrieval results using view invariant prototype embedding (VISPE) outperform other self-supervised learning methods.
arXiv Detail & Related papers (2020-03-28T07:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.