WebCPM: Interactive Web Search for Chinese Long-form Question Answering
- URL: http://arxiv.org/abs/2305.06849v2
- Date: Tue, 23 May 2023 13:15:10 GMT
- Title: WebCPM: Interactive Web Search for Chinese Long-form Question Answering
- Authors: Yujia Qin, Zihan Cai, Dian Jin, Lan Yan, Shihao Liang, Kunlun Zhu,
Yankai Lin, Xu Han, Ning Ding, Huadong Wang, Ruobing Xie, Fanchao Qi, Zhiyuan
Liu, Maosong Sun, and Jie Zhou
- Abstract summary: Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses.
We introduce WebCPM, the first Chinese LFQA dataset.
We collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions.
- Score: 104.676752359777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-form question answering (LFQA) aims at answering complex, open-ended
questions with detailed, paragraph-length responses. The de facto paradigm of
LFQA necessitates two procedures: information retrieval, which searches for
relevant supporting facts, and information synthesis, which integrates these
facts into a coherent answer. In this paper, we introduce WebCPM, the first
Chinese LFQA dataset. One unique feature of WebCPM is that its information
retrieval is based on interactive web search, which engages with a search
engine in real time. Following WebGPT, we develop a web search interface. We
recruit annotators to search for relevant information using our interface and
then answer questions. Meanwhile, the web search behaviors of our annotators
would be recorded. In total, we collect 5,500 high-quality question-answer
pairs, together with 14,315 supporting facts and 121,330 web search actions. We
fine-tune pre-trained language models to imitate human behaviors for web search
and to generate answers based on the collected facts. Our LFQA pipeline, built
on these fine-tuned models, generates answers that are no worse than
human-written ones in 32.5% and 47.5% of the cases on our dataset and DuReader,
respectively.
Related papers
- Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024 [1.3923460621808879]
We show that the reasoning power of large language models (LLMs) and the retrieval power of modern search engines can be combined to automate this process.
We integrate LLMs and search under a multi-hop evidence pursuit strategy.
Our submitted system achieves.510 AVeriTeC score on the dev set and.477 AVeriTeC score on the test set.
arXiv Detail & Related papers (2024-11-08T18:25:06Z) - Open Domain Question Answering with Conflicting Contexts [55.739842087655774]
We find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search.
We ask our annotators to provide explanations for their selections of correct answers.
arXiv Detail & Related papers (2024-10-16T07:24:28Z) - Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries [91.70689724416698]
We present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources.
Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z) - Researchy Questions: A Dataset of Multi-Perspective, Decompositional
Questions for LLM Web Agents [22.023543164141504]
We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, decompositional'' and multi-perspective.
We show that users spend a lot of effort'' on these questions in terms of signals like clicks and session length.
We also show that slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly.
arXiv Detail & Related papers (2024-02-27T21:27:16Z) - Evaluating Mixed-initiative Conversational Search Systems via User
Simulation [9.066817876491053]
We propose a conversational User Simulator, called USi, for automatic evaluation of such search systems.
We show that responses generated by USi are both inline with the underlying information need and comparable to human-generated answers.
arXiv Detail & Related papers (2022-04-17T16:27:33Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.