Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling
- URL: http://arxiv.org/abs/2402.19471v2
- Date: Wed, 1 May 2024 19:00:06 GMT
- Title: Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling
- Authors: Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum,
- Abstract summary: We study tradeoffs in a classic grounded question-asking task based on the board game Battleship.
Our model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain.
We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance.
- Score: 80.64715784334936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses large language models (LLMs) to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain. We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance across varied Battleship board scenarios. In contrast, LLM-only baselines struggle to ground questions in the board state; notably, GPT-4V provides no improvement over non-visual baselines. Our results illustrate how Bayesian models of question-asking can leverage the statistics of language to capture human priors, while highlighting some shortcomings of pure LLMs as grounded reasoners.
Related papers
- Reasoning with Large Language Models, a Survey [2.831296564800826]
This paper reviews the rapidly expanding field of prompt-based reasoning with LLMs.
Our taxonomy identifies different ways to generate, evaluate, and control multi-step reasoning.
We find that self-improvement, self-reflection, and some meta abilities of the reasoning processes are possible through the judicious use of prompts.
arXiv Detail & Related papers (2024-07-16T08:49:35Z) - Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM.
We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z) - Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering [67.94354589215637]
Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations.
In this paper, we perceive the LLMs' knowledge boundary (KB) with semi-open-ended questions (SoeQ)
We find that GPT-4 performs poorly on SoeQ and is often unaware of its KB.
Our auxiliary model, LLaMA-2-13B, is effective in discovering more ambiguous answers.
arXiv Detail & Related papers (2024-05-23T10:00:14Z) - LAMP: A Language Model on the Map [13.75316123602933]
Large Language Models (LLMs) are poised to play an increasingly important role in our lives, providing assistance across a wide array of tasks.
This study introduces a novel framework for fine-tuning a pre-trained model on city-specific data, to enable it to provide accurate recommendations.
arXiv Detail & Related papers (2024-03-14T02:56:38Z) - CLadder: Assessing Causal Reasoning in Language Models [82.8719238178569]
We investigate whether large language models (LLMs) can coherently reason about causality.
We propose a new NLP task, causal inference in natural language, inspired by the "causal inference engine" postulated by Judea Pearl et al.
arXiv Detail & Related papers (2023-12-07T15:12:12Z) - Improving Zero-shot Visual Question Answering via Large Language Models
with Reasoning Question Prompts [22.669502403623166]
We present Reasoning Question Prompts for VQA tasks, which can further activate the potential of Large Language Models.
We generate self-contained questions as reasoning question prompts via an unsupervised question edition module.
Each reasoning question prompt clearly indicates the intent of the original question.
Then, the candidate answers associated with their confidence scores acting as answer integritys are fed into LLMs.
arXiv Detail & Related papers (2023-11-15T15:40:46Z) - Pragmatic Evaluation of Clarifying Questions with Fact-Level Masking [21.480602733510256]
We present a definition and framework for natural language pragmatic asking of clarifying questions (PACQ)
We also present fact-level masking (FLM), a procedure for converting natural language datasets into self-supervised PACQ datasets.
Our experiments show that current zero-shot models struggle to ask questions that retrieve useful information, as compared to human annotators.
arXiv Detail & Related papers (2023-10-17T20:40:59Z) - Probing the Multi-turn Planning Capabilities of LLMs via 20 Question
Games [14.063311955315077]
Large language models (LLMs) are effective at answering questions that are clearly asked.
When faced with ambiguous queries they can act unpredictably and produce incorrect outputs.
This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively.
arXiv Detail & Related papers (2023-10-02T16:55:37Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Question Answering as Programming for Solving Time-Sensitive Questions [84.07553016489769]
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world.
Recently, Large Language Models (LLMs) have shown remarkable intelligence in question answering.
This can be attributed to the LLMs' inability to perform rigorous reasoning based on surface-level text semantics.
We propose a novel approach where we reframe the $textbfQ$uestion $textbfA$rogrogering task.
arXiv Detail & Related papers (2023-05-23T16:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.