Query-focused Extractive Summarisation for Biomedical and COVID-19
Complex Question Answering
- URL: http://arxiv.org/abs/2209.01815v1
- Date: Mon, 5 Sep 2022 07:56:44 GMT
- Title: Query-focused Extractive Summarisation for Biomedical and COVID-19
Complex Question Answering
- Authors: Diego Moll\'a (Macquarie University, Sydney, Australia)
- Abstract summary: This paper presents Macquarie University's participation in the two most recent BioASQ Synergy Tasks.
We apply query-focused extractive summarisation techniques to generate complex answers to biomedical questions.
For the Synergy task, we selected the candidate sentences following two phases: document retrieval and snippet retrieval.
We observed an improvement of results when the system was trained on the second half of the BioASQ10b training data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Macquarie University's participation to the two most
recent BioASQ Synergy Tasks (as per June 2022), and to the BioASQ10 Task~B
(BioASQ10b), Phase~B. In these tasks, participating systems are expected to
generate complex answers to biomedical questions, where the answers may contain
more than one sentence. We apply query-focused extractive summarisation
techniques. In particular, we follow a sentence classification-based approach
that scores each candidate sentence associated to a question, and the $n$
highest-scoring sentences are returned as the answer. The Synergy Task
corresponds to an end-to-end system that requires document selection, snippet
selection, and finding the final answer, but it has very limited training data.
For the Synergy task, we selected the candidate sentences following two phases:
document retrieval and snippet retrieval, and the final answer was found by
using a DistilBERT/ALBERT classifier that had been trained on the training data
of BioASQ9b. Document retrieval was achieved as a standard search over the
CORD-19 data using the search API provided by the BioASQ organisers, and
snippet retrieval was achieved by re-ranking the sentences of the top retrieved
documents, using the cosine similarity of the question and candidate sentence.
We observed that vectors represented via sBERT have an edge over tf.idf.
BioASQ10b Phase B focuses on finding the specific answers to biomedical
questions. For this task, we followed a data-centric approach. We hypothesised
that the training data of the first BioASQ years might be biased and we
experimented with different subsets of the training data. We observed an
improvement of results when the system was trained on the second half of the
BioASQ10b training data.
Related papers
- RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions [52.33835101586687]
Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries.
This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions [1.0742675209112622]
We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM)
We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection.
Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.
arXiv Detail & Related papers (2024-07-09T11:48:49Z) - Biomedical Entity Linking as Multiple Choice Question Answering [48.74212158495695]
We present BioELQA, a novel model that treats Biomedical Entity Linking as Multiple Choice Question Answering.
We first obtains candidate entities with a fast retriever, jointly presents the mention and candidate entities to a generator, and then outputs the predicted symbol associated with its chosen entity.
To improve generalization for long-tailed entities, we retrieve similar labeled training instances as clues and the input with retrieved instances for the generator.
arXiv Detail & Related papers (2024-02-23T08:40:38Z) - Contributions to the Improvement of Question Answering Systems in the
Biomedical Domain [0.951828574518325]
This thesis work falls within the framework of question answering (QA) in the biomedical domain.
We propose four contributions to improve the performance of QA in the biomedical domain.
We develop a fully automated semantic biomedical QA system called SemBioNLQA.
arXiv Detail & Related papers (2023-07-25T16:31:20Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Query-Focused Extractive Summarisation for Finding Ideal Answers to
Biomedical and COVID-19 Questions [7.6997148655751895]
Macquarie University participated in the BioASQ Synergy Task and BioASQ9b Phase B.
We used a query-focused summarisation system that was trained with the BioASQ8b training data set.
Considering the poor quality of the documents and snippets retrieved by our system, we observed reasonably good quality in the answers returned.
arXiv Detail & Related papers (2021-08-27T09:19:42Z) - TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and
Textual Content in Finance [71.76018597965378]
We build a new large-scale Question Answering dataset containing both Tabular And Textual data, named TAT-QA.
We propose a novel QA model termed TAGOP, which is capable of reasoning over both tables and text.
arXiv Detail & Related papers (2021-05-17T06:12:06Z) - Transferability of Natural Language Inference to Biomedical Question
Answering [17.38537039378825]
We focus on applying BioBERT to transfer the knowledge of natural language inference (NLI) to biomedical question answering (QA)
We observe that BioBERT trained on the NLI dataset obtains better performance on Yes/No (+5.59%), Factoid (+0.53%), List type (+13.58%) questions.
We present a sequential transfer learning method that significantly performed well in the 8th BioASQ Challenge (Phase B)
arXiv Detail & Related papers (2020-07-01T04:05:48Z) - A Study on Efficiency, Accuracy and Document Structure for Answer
Sentence Selection [112.0514737686492]
In this paper, we argue that by exploiting the intrinsic structure of the original rank together with an effective word-relatedness encoder, we can achieve competitive results.
Our model takes 9.5 seconds to train on the WikiQA dataset, i.e., very fast in comparison with the $sim 18$ minutes required by a standard BERT-base fine-tuning.
arXiv Detail & Related papers (2020-03-04T22:12:18Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z) - UNCC Biomedical Semantic Question Answering Systems. BioASQ: Task-7B,
Phase-B [1.976652238476722]
We present our approach for Task-7b, Phase B, Exact Answering Task.
These Question Answering (QA) tasks include Factoid, Yes/No, List Type Question answering.
Our system is based on a contextual word embedding model.
arXiv Detail & Related papers (2020-02-05T20:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.