Related papers: Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection

Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection

URL: http://arxiv.org/abs/2011.07208v1
Date: Sat, 14 Nov 2020 03:15:26 GMT
Title: Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection
Authors: Md Tahmid Rahman Laskar, Enamul Hoque, Jimmy Xiangji Huang
Abstract summary: We adopt a transformer-based model for the language modeling task in a large dataset and fine-tune it for downstream tasks. We find that fine-tuning the BERT model for the answer selection task is very effective and observe a maximum improvement of 13.1% in the QA datasets and 18.7% in the CQA datasets.
Score: 16.048329028104643
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-training a transformer-based model for the language modeling task in a large dataset and then fine-tuning it for downstream tasks has been found very useful in recent years. One major advantage of such pre-trained language models is that they can effectively absorb the context of each word in a sentence. However, for tasks such as the answer selection task, the pre-trained language models have not been extensively used yet. To investigate their effectiveness in such tasks, in this paper, we adopt the pre-trained Bidirectional Encoder Representations from Transformer (BERT) language model and fine-tune it on two Question Answering (QA) datasets and three Community Question Answering (CQA) datasets for the answer selection task. We find that fine-tuning the BERT model for the answer selection task is very effective and observe a maximum improvement of 13.1% in the QA datasets and 18.7% in the CQA datasets compared to the previous state-of-the-art.

Related papers

Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning. We construct pseudo-skill clusters by grouping gradient-based sample vectors. We select the best-performing data selector for each skill cluster from a pool of selector experts. This data selector samples a subset of the most important samples from each skill cluster for training.
arXiv Detail & Related papers (2024-10-14T15:48:09Z)
Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection [99.59693674455582]
We propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models.
arXiv Detail & Related papers (2022-05-20T22:39:00Z)
Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z)
Parameter-Efficient Abstractive Question Answering over Tables or Text [60.86457030988444]
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries. Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables. To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
arXiv Detail & Related papers (2022-04-07T10:56:29Z)
A Comparative Study of Transformer-Based Language Models on Extractive Question Answering [0.5079811885340514]
We train various pre-trained language models and fine-tune them on multiple question answering datasets. Using the F1-score as our metric, we find that the RoBERTa and BART pre-trained models perform the best across all datasets.
arXiv Detail & Related papers (2021-10-07T02:23:19Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
VANiLLa : Verbalized Answers in Natural Language at Large Scale [2.9098477555578333]
This dataset consists of over 100k simple questions adapted from the CSQA and SimpleQuestionsWikidata datasets. The answer sentences in this dataset are syntactically and semantically closer to the question than to the triple fact.
arXiv Detail & Related papers (2021-05-24T16:57:54Z)
Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems. The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z)
Simplifying Paragraph-level Question Generation via Transformer Language Models [0.0]
Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text. A single Transformer-based unidirectional language model leveraging transfer learning can be used to produce high quality questions. Our QG model, finetuned from GPT-2 Small, outperforms several paragraph-level QG baselines on the SQuAD dataset by 0.95 METEOR points.
arXiv Detail & Related papers (2020-05-03T14:57:24Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.