What makes us curious? analysis of a corpus of open-domain questions
- URL: http://arxiv.org/abs/2110.15409v1
- Date: Thu, 28 Oct 2021 19:37:43 GMT
- Title: What makes us curious? analysis of a corpus of open-domain questions
- Authors: Zhaozhen Xu, Amelia Howarth, Nicole Briggs, Nello Cristianini
- Abstract summary: In 2017, "We the Curious" science centre in Bristol started a project to capture the curiosity of Bristolians.
The project collected more than 10,000 questions on various topics.
We developed an Artificial Intelligence tool that can be used to perform various processing tasks.
- Score: 0.11470070927586014
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Every day people ask short questions through smart devices or online forums
to seek answers to all kinds of queries. With the increasing number of
questions collected it becomes difficult to provide answers to each of them,
which is one of the reasons behind the growing interest in automated question
answering. Some questions are similar to existing ones that have already been
answered, while others could be answered by an external knowledge source such
as Wikipedia. An important question is what can be revealed by analysing a
large set of questions. In 2017, "We the Curious" science centre in Bristol
started a project to capture the curiosity of Bristolians: the project
collected more than 10,000 questions on various topics. As no rules were given
during collection, the questions are truly open-domain, and ranged across a
variety of topics. One important aim for the science centre was to understand
what concerns its visitors had beyond science, particularly on societal and
cultural issues. We addressed this question by developing an Artificial
Intelligence tool that can be used to perform various processing tasks:
detection of equivalence between questions; detection of topic and type; and
answering of the question. As we focused on the creation of a "generalist"
tool, we trained it with labelled data from different datasets. We called the
resulting model QBERT. This paper describes what information we extracted from
the automated analysis of the WTC corpus of open-domain questions.
Related papers
- Which questions should I answer? Salience Prediction of Inquisitive Questions [118.097974193544]
We show that highly salient questions are empirically more likely to be answered in the same article.
We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
arXiv Detail & Related papers (2024-04-16T21:33:05Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - What Types of Questions Require Conversation to Answer? A Case Study of
AskReddit Questions [16.75969771718778]
We aim to push the boundaries of conversational systems by examining the types of nebulous, open-ended questions that can best be answered through conversation.
We sampled 500 questions from one million open-ended requests posted on AskReddit, and then recruited online crowd workers to answer eight inquiries about these questions.
We found that the issues people believe require conversation to resolve satisfactorily are highly social and personal.
arXiv Detail & Related papers (2023-03-30T21:05:22Z) - CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums.
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z) - CONSISTENT: Open-Ended Question Generation From News Articles [38.41162895492449]
We propose CONSISTENT, a new end-to-end system for generating open-ended questions.
We demonstrate our model's strength over several baselines using both automatic and human=based evaluations.
We discuss potential downstream applications for news media organizations.
arXiv Detail & Related papers (2022-10-20T19:10:07Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Inquisitive Question Generation for High Level Text Comprehension [60.21497846332531]
We introduce INQUISITIVE, a dataset of 19K questions that are elicited while a person is reading through a document.
We show that readers engage in a series of pragmatic strategies to seek information.
We evaluate question generation models based on GPT-2 and show that our model is able to generate reasonable questions.
arXiv Detail & Related papers (2020-10-04T19:03:39Z) - Stay Hungry, Stay Focused: Generating Informative and Specific Questions
in Information-Seeking Conversations [41.74162467619795]
We investigate the problem of generating informative questions in information-asymmetric conversations.
To generate pragmatic questions, we use reinforcement learning to optimize an informativeness metric.
We demonstrate that the resulting pragmatic questioner substantially improves the informativeness and specificity of questions generated over a baseline model.
arXiv Detail & Related papers (2020-04-30T00:49:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.