Agile Modeling: From Concept to Classifier in Minutes
- URL: http://arxiv.org/abs/2302.12948v2
- Date: Fri, 12 May 2023 23:50:47 GMT
- Title: Agile Modeling: From Concept to Classifier in Minutes
- Authors: Otilia Stretcu, Edward Vendrow, Kenji Hata, Krishnamurthy Viswanathan,
Vittorio Ferrari, Sasan Tavakkol, Wenlei Zhou, Aditya Avinash, Enming Luo,
Neil Gordon Alldrin, MohammadHossein Bateni, Gabriel Berger, Andrew Bunner,
Chun-Ta Lu, Javier A Rey, Giulia DeSalvo, Ranjay Krishna, Ariel Fuxman
- Abstract summary: We introduce the problem of Agile Modeling: the process of turning any subjective visual concept into a computer vision model.
We show through a user study that users can create classifiers with minimal effort under 30 minutes.
We compare this user driven process with the traditional crowdsourcing paradigm and find that the crowd's notion often differs from that of the user's.
- Score: 35.03003329814567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of computer vision to nuanced subjective use cases is
growing. While crowdsourcing has served the vision community well for most
objective tasks (such as labeling a "zebra"), it now falters on tasks where
there is substantial subjectivity in the concept (such as identifying "gourmet
tuna"). However, empowering any user to develop a classifier for their concept
is technically difficult: users are neither machine learning experts, nor have
the patience to label thousands of examples. In reaction, we introduce the
problem of Agile Modeling: the process of turning any subjective visual concept
into a computer vision model through a real-time user-in-the-loop interactions.
We instantiate an Agile Modeling prototype for image classification and show
through a user study (N=14) that users can create classifiers with minimal
effort under 30 minutes. We compare this user driven process with the
traditional crowdsourcing paradigm and find that the crowd's notion often
differs from that of the user's, especially as the concepts become more
subjective. Finally, we scale our experiments with simulations of users
training classifiers for ImageNet21k categories to further demonstrate the
efficacy.
Related papers
- Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.
We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z) - Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use [14.2527771630478]
We propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions.
Our framework eliminates the need for crowd-sourced annotations.
Our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models.
arXiv Detail & Related papers (2024-03-05T03:34:11Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis [20.316056261749946]
We propose an end-to-end vision and language model incorporating explicit knowledge graphs.
We also introduce an interactive out-of-distribution layer using implicit network operator.
In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval.
arXiv Detail & Related papers (2023-02-11T05:46:21Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Learning to Prompt for Vision-Language Models [82.25005817904027]
Vision-language pre-training has emerged as a promising alternative for representation learning.
It shifts from the tradition of using images and discrete labels for learning a fixed set of weights, seen as visual concepts, to aligning images and raw text for two separate encoders.
Such a paradigm benefits from a broader source of supervision and allows zero-shot transfer to downstream tasks.
arXiv Detail & Related papers (2021-09-02T17:57:31Z) - Exploiting Behavioral Consistence for Universal User Representation [11.290137806288191]
We focus on developing universal user representation model.
The obtained universal representations are expected to contain rich information.
We propose Self-supervised User Modeling Network (SUMN) to encode behavior data into the universal representation.
arXiv Detail & Related papers (2020-12-11T06:10:14Z) - Interactive Weak Supervision: Learning Useful Heuristics for Data
Labeling [19.24454872492008]
Weak supervision offers a promising alternative for producing labeled datasets without ground truth labels.
We develop the first framework for interactive weak supervision in which a method proposes iterations and learns from user feedback.
Our experiments demonstrate that only a small number of feedback are needed to train models that achieve highly competitive test set performance.
arXiv Detail & Related papers (2020-12-11T00:10:38Z) - Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning [91.58529629419135]
We consider how to characterise visual groupings discovered automatically by deep neural networks.
We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings.
arXiv Detail & Related papers (2020-10-27T18:41:49Z) - Self-Supervised Viewpoint Learning From Image Collections [116.56304441362994]
We propose a novel learning framework which incorporates an analysis-by-synthesis paradigm to reconstruct images in a viewpoint aware manner.
We show that our approach performs competitively to fully-supervised approaches for several object categories like human faces, cars, buses, and trains.
arXiv Detail & Related papers (2020-04-03T22:01:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.