NeuralQA: A Usable Library for Question Answering (Contextual Query
Expansion + BERT) on Large Datasets
- URL: http://arxiv.org/abs/2007.15211v2
- Date: Sat, 28 Nov 2020 03:06:09 GMT
- Title: NeuralQA: A Usable Library for Question Answering (Contextual Query
Expansion + BERT) on Large Datasets
- Authors: Victor Dibia
- Abstract summary: NeuralQA is a library for Question Answering (QA) on large datasets.
It integrates with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks.
Code and documentation for NeuralQA is available as open source on Github.
- Score: 0.6091702876917281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing tools for Question Answering (QA) have challenges that limit their
use in practice. They can be complex to set up or integrate with existing
infrastructure, do not offer configurable interactive interfaces, and do not
cover the full set of subtasks that frequently comprise the QA pipeline (query
expansion, retrieval, reading, and explanation/sensemaking). To help address
these issues, we introduce NeuralQA - a usable library for QA on large
datasets. NeuralQA integrates well with existing infrastructure (e.g.,
ElasticSearch instances and reader models trained with the HuggingFace
Transformers API) and offers helpful defaults for QA subtasks. It introduces
and implements contextual query expansion (CQE) using a masked language model
(MLM) as well as relevant snippets (RelSnip) - a method for condensing large
documents into smaller passages that can be speedily processed by a document
reader model. Finally, it offers a flexible user interface to support workflows
for research explorations (e.g., visualization of gradient-based explanations
to support qualitative inspection of model behaviour) and large scale search
deployment. Code and documentation for NeuralQA is available as open source on
Github (https://github.com/victordibia/neuralqa}{Github).
Related papers
- IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization [59.06663981902496]
Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization.
We investigate two indispensable characteristics that the LLMs-based QFS models should be harnessed, Lengthy Document Summarization and Efficiently Fine-grained Query-LLM Alignment.
These innovations pave the way for broader application and accessibility in the field of QFS technology.
arXiv Detail & Related papers (2024-07-15T07:14:56Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models [21.95962189710859]
We propose a lightweight, end-to-end framework to execute the Spoken Question Answering (SQA) task on the LibriSQA dataset.
By reforming ASR into the SQA format, we further substantiate our framework's capability in handling ASR tasks.
Our empirical findings bolster the LLMs' aptitude for aligning and comprehending multimodal information, paving the way for the development of universal multimodal LLMs.
arXiv Detail & Related papers (2023-08-20T23:47:23Z) - A Practical Toolkit for Multilingual Question and Answer Generation [79.31199020420827]
We introduce AutoQG, an online service for multilingual QAG, along with lmqg, an all-in-one Python package for model fine-tuning, generation, and evaluation.
We also release QAG models in eight languages fine-tuned on a few variants of pre-trained encoder-decoder language models, which can be used online via AutoQG or locally via lmqg.
arXiv Detail & Related papers (2023-05-27T08:42:37Z) - Long-Tailed Question Answering in an Open World [46.67715607552547]
We define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data.
We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks.
On a large-scale OLTQA dataset, our model consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2023-05-11T04:28:58Z) - PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question
Answering Research and Development [24.022050096797606]
PRIMEQA is a one-stop QA repository with an aim to democratize QA re-search and facilitate easy replication of state-of-the-art (SOTA) QA methods.
It supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation.
It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on pub-lic benchmarks, and expanding pre-existing methods.
arXiv Detail & Related papers (2023-01-23T20:43:26Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - UKP-SQUARE: An Online Platform for Question Answering Research [50.35348764297317]
We present UKP-SQUARE, an online QA platform for researchers which allows users to query and analyze a large collection of modern Skills.
UKP-SQUARE allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated tests.
arXiv Detail & Related papers (2022-03-25T15:00:24Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.