The First Place Solution of WSDM Cup 2024: Leveraging Large Language
Models for Conversational Multi-Doc QA
- URL: http://arxiv.org/abs/2402.18385v1
- Date: Wed, 28 Feb 2024 15:05:43 GMT
- Title: The First Place Solution of WSDM Cup 2024: Leveraging Large Language
Models for Conversational Multi-Doc QA
- Authors: Yiming Li and Zhao Zhang
- Abstract summary: We introduce our winning approach for the "Conversational Multi-Doc QA" challenge in WSDM Cup 2024.
We first adapt Large Language Models to the task, then devise a hybrid training strategy to make the most of in-domain unlabeled data.
Our solution ranked 1st place in WSDM Cup 2024, surpassing its rivals to a large extent.
- Score: 15.405052113769164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversational multi-doc question answering aims to answer specific questions
based on the retrieved documents as well as the contextual conversations. In
this paper, we introduce our winning approach for the "Conversational Multi-Doc
QA" challenge in WSDM Cup 2024, which exploits the superior natural language
understanding and generation capability of Large Language Models (LLMs). We
first adapt LLMs to the task, then devise a hybrid training strategy to make
the most of in-domain unlabeled data. Moreover, an advanced text embedding
model is adopted to filter out potentially irrelevant documents and several
approaches are designed and compared for the model ensemble. Equipped with all
these techniques, our solution finally ranked 1st place in WSDM Cup 2024,
surpassing its rivals to a large extent. The source codes have been released at
https://github.com/zhangzhao219/WSDM-Cup-2024.
Related papers
- M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization? [49.53982792497275]
We investigate whether Large Vision-Language Models (LVLMs) genuinely comprehend interleaved image-text in the document.
Existing document understanding benchmarks often assess LVLMs using question-answer formats.
We introduce a novel and challenging Multimodal Document Summarization Benchmark (M-DocSum-Bench)
M-DocSum-Bench comprises 500 high-quality arXiv papers, along with interleaved multimodal summaries aligned with human preferences.
arXiv Detail & Related papers (2025-03-27T07:28:32Z) - RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering [9.915889321513678]
RAMQA is a unified framework combining learning-to-rank methods with generative permutation-enhanced ranking techniques.
Our generative ranking model generates re-ranked document IDs and specific answers from document candidates in various permutations.
arXiv Detail & Related papers (2025-01-23T00:50:33Z) - Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation [1.7436854281619139]
Large Language Models (LLMs) are able to answer user queries, but in a one-way Q&A format rather than a true conversation.
Fine-tuning on particular datasets is the usual way to modify their style to increase conversational ability, but this is expensive and usually only available in a few languages.
In this study, we explore role-play zero-shot prompting as an efficient and cost-effective solution for open-domain conversation.
arXiv Detail & Related papers (2024-06-26T16:10:53Z) - Heidelberg-Boston @ SIGTYP 2024 Shared Task: Enhancing Low-Resource Language Analysis With Character-Aware Hierarchical Transformers [2.3020018305241337]
This work focuses on PoS tagging, morphological tagging, and lemmatization for 13 historical languages.
We adapt a hierarchical tokenization method from Sun et al. (2023) and combine it with the advantages of the DeBERTa-V3 architecture.
Our models achieved first place in the constrained subtask, nearly reaching the performance levels of the unconstrained task's winner.
arXiv Detail & Related papers (2024-05-30T15:23:34Z) - Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model [22.07414287186125]
Quest is a query-centric data method aggregating semantically relevant yet diverse documents.
It uses a generative model to predict potential queries for each document, grouping documents with similar queries and keywords.
Experiments demonstrate Quest's superior performance on long-context tasks, achieving remarkable results with context lengths of up to 1M tokens.
arXiv Detail & Related papers (2024-05-30T08:50:55Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
Large Language Models [69.59125732317972]
We propose a simple yet effective Retrieving-to-Answer (R2A) framework for VideoQA.
R2A first retrieves a set of semantically similar texts from a generic text corpus using a pre-trained multi-modal model.
With both the question and the retrieved texts, a LLM can be directly used to yield a desired answer.
arXiv Detail & Related papers (2023-06-15T20:56:20Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - MGDoc: Pre-training with Multi-granular Hierarchy for Document Image
Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks.
Existing methods learn features from either word-level or region-level but fail to consider both simultaneously.
We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - IITD-DBAI: Multi-Stage Retrieval with Pseudo-Relevance Feedback and
Query Reformulation [0.0]
Resolving the contextual dependency is one of the most challenging tasks in the Conversational system.
Our submission has produced a mean NDCG@3 performance better than the median model.
arXiv Detail & Related papers (2022-03-31T14:07:47Z) - Tradeoffs in Sentence Selection Techniques for Open-Domain Question
Answering [54.541952928070344]
We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question.
We show that very lightweight QA models can do well at this task, but retrieval-based models are faster still.
arXiv Detail & Related papers (2020-09-18T23:39:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.