Related papers: Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents

Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents

URL: http://arxiv.org/abs/2402.17896v1
Date: Tue, 27 Feb 2024 21:27:16 GMT
Title: Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
Authors: Corby Rosset, Ho-Lam Chung, Guanghui Qin, Ethan C. Chau, Zhuo Feng, Ahmed Awadallah, Jennifer Neville, Nikhil Rao
Abstract summary: We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, decompositional'' and multi-perspective. We show that users spend a lot of effort'' on these questions in terms of signals like clicks and session length. We also show that slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly.
Score: 22.023543164141504
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly study ``known unknowns'' with clear indications of both what information is missing, and how to find it to answer the question. Hence, good performance on these benchmarks provides a false sense of security. A yet unmet need of the NLP community is a bank of non-factoid, multi-perspective questions involving a great deal of unclear information needs, i.e. ``unknown uknowns''. We claim we can find such questions in search engine logs, which is surprising because most question-intent queries are indeed factoid. We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, ``decompositional'' and multi-perspective. We show that users spend a lot of ``effort'' on these questions in terms of signals like clicks and session length, and that they are also challenging for GPT-4. We also show that ``slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly. We release $\sim$ 100k Researchy Questions, along with the Clueweb22 URLs that were clicked.

Related papers

No Stupid Questions: An Analysis of Question Query Generation for Citation Recommendation [29.419731388642393]
GPT-4o-mini asks questions which, when answered, could expose new insights about an excerpt from a scientific article.<n>We evaluate the utility of these questions as retrieval queries, measuring their effectiveness in retrieving and ranking masked target documents.
arXiv Detail & Related papers (2025-06-09T20:13:32Z)
Which questions should I answer? Salience Prediction of Inquisitive Questions [118.097974193544]
We show that highly salient questions are empirically more likely to be answered in the same article. We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
arXiv Detail & Related papers (2024-04-16T21:33:05Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections. We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z)
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers. We call this dataset ConditionalQA. We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.