Is the Elephant Flying? Resolving Ambiguities in Text-to-Image
Generative Models
- URL: http://arxiv.org/abs/2211.12503v1
- Date: Thu, 17 Nov 2022 17:12:43 GMT
- Title: Is the Elephant Flying? Resolving Ambiguities in Text-to-Image
Generative Models
- Authors: Ninareh Mehrabi, Palash Goyal, Apurv Verma, Jwala Dhamala, Varun
Kumar, Qian Hu, Kai-Wei Chang, Richard Zemel, Aram Galstyan, Rahul Gupta
- Abstract summary: We study ambiguities that arise in text-to-image generative models.
We propose a framework to mitigate ambiguities in the prompts given to the systems by soliciting clarifications from the user.
- Score: 64.58271886337826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural language often contains ambiguities that can lead to
misinterpretation and miscommunication. While humans can handle ambiguities
effectively by asking clarifying questions and/or relying on contextual cues
and common-sense knowledge, resolving ambiguities can be notoriously hard for
machines. In this work, we study ambiguities that arise in text-to-image
generative models. We curate a benchmark dataset covering different types of
ambiguities that occur in these systems. We then propose a framework to
mitigate ambiguities in the prompts given to the systems by soliciting
clarifications from the user. Through automatic and human evaluations, we show
the effectiveness of our framework in generating more faithful images aligned
with human intention in the presence of ambiguities.
Related papers
- Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! [14.84123301554462]
We present UNPIE, a novel benchmark designed to assess the impact of multimodal inputs in resolving lexical ambiguities.
Our dataset includes 1,000 puns, each accompanied by an image that explains both meanings.
The results indicate that various Socratic Models and Visual-Language Models improve over the text-only models when given visual context.
arXiv Detail & Related papers (2024-10-01T19:32:57Z) - AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - A Taxonomy of Ambiguity Types for NLP [53.10379645698917]
We propose a taxonomy of ambiguity types as seen in English to facilitate NLP analysis.
Our taxonomy can help make meaningful splits in language ambiguity data, allowing for more fine-grained assessments of both datasets and model performance.
arXiv Detail & Related papers (2024-03-21T01:47:22Z) - Zero and Few-shot Semantic Parsing with Ambiguous Inputs [45.285508941560295]
We introduce AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code.
Using AmP, we investigate how several few-shot text-to-code systems handle ambiguity, introducing three new metrics.
We find that large pre-trained models perform poorly at capturing the distribution of possible meanings without deliberate instruction.
arXiv Detail & Related papers (2023-06-01T15:46:36Z) - We're Afraid Language Models Aren't Modeling Ambiguity [136.8068419824318]
Managing ambiguity is a key part of human language understanding.
We characterize ambiguity in a sentence by its effect on entailment relations with another sentence.
We show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity.
arXiv Detail & Related papers (2023-04-27T17:57:58Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Open-domain clarification question generation without question examples [4.34222556313791]
We propose a framework for building a question-asking model capable of producing polar (yes-no) clarification questions.
Our model uses an expected information gain objective to derive informative questions from an off-the-shelf image captioner.
We demonstrate our model's ability to pose questions that improve communicative success in a goal-oriented 20 questions game with synthetic and human answerers.
arXiv Detail & Related papers (2021-10-19T07:51:54Z) - Deciding Whether to Ask Clarifying Questions in Large-Scale Spoken
Language Understanding [28.195853603190447]
A large-scale conversational agent can suffer from understanding user utterances with various ambiguities.
We propose a neural self-attentive model that leverages the hypotheses with ambiguities and contextual signals.
arXiv Detail & Related papers (2021-09-25T22:32:10Z) - Provable Limitations of Acquiring Meaning from Ungrounded Form: What
will Future Language Models Understand? [87.20342701232869]
We investigate the abilities of ungrounded systems to acquire meaning.
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
We find that assertions enable semantic emulation if all expressions in the language are referentially transparent.
However, if the language uses non-transparent patterns like variable binding, we show that emulation can become an uncomputable problem.
arXiv Detail & Related papers (2021-04-22T01:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.