Separating Skills and Concepts for Novel Visual Question Answering
- URL: http://arxiv.org/abs/2107.09106v1
- Date: Mon, 19 Jul 2021 18:55:10 GMT
- Title: Separating Skills and Concepts for Novel Visual Question Answering
- Authors: Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko
- Abstract summary: Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models.
"Skills" are visual tasks, such as counting or attribute recognition, and are applied to "concepts" mentioned in the question.
We present a novel method for learning to compose skills and concepts that separates these two factors implicitly within a model.
- Score: 66.46070380927372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization to out-of-distribution data has been a problem for Visual
Question Answering (VQA) models. To measure generalization to novel questions,
we propose to separate them into "skills" and "concepts". "Skills" are visual
tasks, such as counting or attribute recognition, and are applied to "concepts"
mentioned in the question, such as objects and people. VQA methods should be
able to compose skills and concepts in novel ways, regardless of whether the
specific composition has been seen in training, yet we demonstrate that
existing models have much to improve upon towards handling new compositions. We
present a novel method for learning to compose skills and concepts that
separates these two factors implicitly within a model by learning grounded
concept representations and disentangling the encoding of skills from that of
concepts. We enforce these properties with a novel contrastive learning
procedure that does not rely on external annotations and can be learned from
unlabeled image-question pairs. Experiments demonstrate the effectiveness of
our approach for improving compositional and grounding performance.
Related papers
- Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? [62.984473889987605]
We present a zero-shot framework for fine-grained visual concept learning by leveraging large language model and Visual Question Answering (VQA) system.
We pose these questions along with the query image to a VQA system and aggregate the answers to determine the presence or absence of an object in the test images.
Our experiments demonstrate comparable performance with existing zero-shot visual classification methods and few-shot concept learning approaches.
arXiv Detail & Related papers (2024-10-17T15:16:10Z) - Explaining Explainability: Understanding Concept Activation Vectors [35.37586279472797]
Recent interpretability methods propose using concept-based explanations to translate internal representations of deep learning models into a language that humans are familiar with: concepts.
This requires understanding which concepts are present in the representation space of a neural network.
In this work, we investigate three properties of Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars.
We introduce tools designed to detect the presence of these properties, provide insight into how they affect the derived explanations, and provide recommendations to minimise their impact.
arXiv Detail & Related papers (2024-04-04T17:46:20Z) - Advancing Ante-Hoc Explainable Models through Generative Adversarial Networks [24.45212348373868]
This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks.
Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training.
This work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations.
arXiv Detail & Related papers (2024-01-09T16:16:16Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - Succinct Representations for Concepts [12.134564449202708]
Foundation models like chatGPT have demonstrated remarkable performance on various tasks.
However, for many questions, they may produce false answers that look accurate.
In this paper, we introduce succinct representations of concepts based on category theory.
arXiv Detail & Related papers (2023-03-01T12:11:23Z) - Translational Concept Embedding for Generalized Compositional Zero-shot
Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion.
This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - A Competence-aware Curriculum for Visual Concepts Learning via Question
Answering [95.35905804211698]
We propose a competence-aware curriculum for visual concept learning in a question-answering manner.
We design a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process.
Experimental results on CLEVR show that with a competence-aware curriculum, the proposed method achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-07-03T05:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.