Related papers: Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

URL: http://arxiv.org/abs/2004.01909v1
Date: Sat, 4 Apr 2020 11:07:54 GMT
Title: Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
Authors: Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
Abstract summary: This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs) We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
Score: 56.268862325167575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs). We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. In CQR benchmarks of task-oriented dialogue systems, we evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task. Examining a variety of architectures with different numbers of parameters, we demonstrate that the recent text-to-text transfer transformer (T5) achieves the best results both on CANARD and CAsT with fewer parameters, compared to similar transformer architectures.

Related papers

Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks. We propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP) We show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies.
arXiv Detail & Related papers (2025-04-15T17:35:56Z)
FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
Structural Self-Supervised Objectives for Transformers [3.018656336329545]
This thesis focuses on improving the pre-training of natural language models using unsupervised raw data. In the first part, we introduce three alternative pre-training objectives to BERT's Masked Language Modeling (MLM) In the second part, we proposes self-supervised pre-training tasks that align structurally with downstream applications.
arXiv Detail & Related papers (2023-09-15T09:30:45Z)
Parameter-Efficient Abstractive Question Answering over Tables or Text [60.86457030988444]
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries. Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables. To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
arXiv Detail & Related papers (2022-04-07T10:56:29Z)
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization [18.791701342934605]
The Query Focused Text Summarization (QFTS) task aims at building systems that generate the summary of the text document(s) based on a given query. A key challenge in addressing this task is the lack of large labeled data for training the summarization model. We address this challenge by exploring a series of domain adaptation techniques.
arXiv Detail & Related papers (2021-12-22T05:34:56Z)
BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context [0.0]
We call this approach BET by which we analyze the backtranslation data augmentation on the transformer-based architectures. Our findings suggest that BET improves the paraphrase identification performance on the Microsoft Research Paraphrase Corpus to more than 3% on both accuracy and F1 score.
arXiv Detail & Related papers (2020-09-25T22:06:06Z)
Generating Diverse and Consistent QA pairs from Contexts with Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts. Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [133.93803565077337]
retrieval-augmented generation models combine pre-trained parametric and non-parametric memory for language generation. We show that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
arXiv Detail & Related papers (2020-05-22T21:34:34Z)
Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words" Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.