Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs
- URL: http://arxiv.org/abs/2211.07950v1
- Date: Tue, 15 Nov 2022 07:28:14 GMT
- Title: Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs
- Authors: Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna
Shahaf, Ashish Sabharwal
- Abstract summary: We propose a representation learning framework called breakpoint modeling.
Our approach trains models in an efficient and end-to-end fashion to build intermediate representations.
We show the benefit of our main breakpoint transformer, based on T5, over conventional representation learning approaches.
- Score: 37.754787051387034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can we teach natural language understanding models to track their beliefs
through intermediate points in text? We propose a representation learning
framework called breakpoint modeling that allows for learning of this type.
Given any text encoder and data marked with intermediate states (breakpoints)
along with corresponding textual queries viewed as true/false propositions
(i.e., the candidate beliefs of a model, consisting of information changing
through time) our approach trains models in an efficient and end-to-end fashion
to build intermediate representations that facilitate teaching and direct
querying of beliefs at arbitrary points alongside solving other end tasks. To
show the benefit of our approach, we experiment with a diverse set of NLU tasks
including relational reasoning on CLUTRR and narrative understanding on bAbI.
Using novel belief prediction tasks for both tasks, we show the benefit of our
main breakpoint transformer, based on T5, over conventional representation
learning approaches in terms of processing efficiency, prediction accuracy and
prediction consistency, all with minimal to no effect on corresponding QA end
tasks. To show the feasibility of incorporating our belief tracker into more
complex reasoning pipelines, we also obtain SOTA performance on the
three-tiered reasoning challenge for the TRIP benchmark (around 23-32% absolute
improvement on Tasks 2-3).
Related papers
- Investigating the Efficacy of Large Language Models in Reflective
Assessment Methods through Chain of Thoughts Prompting [0.2552922646705803]
Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs' proficiency in complex reasoning tasks.
The primary aim of this research is to assess how well four language models can grade reflective essays of third-year medical students.
arXiv Detail & Related papers (2023-09-30T06:25:27Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - The Unreliability of Explanations in Few-Shot In-Context Learning [50.77996380021221]
We focus on two NLP tasks that involve reasoning over text, namely question answering and natural language inference.
We show that explanations judged as good by humans--those that are logically consistent with the input--usually indicate more accurate predictions.
We present a framework for calibrating model predictions based on the reliability of the explanations.
arXiv Detail & Related papers (2022-05-06T17:57:58Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - Fair Representation Learning using Interpolation Enabled Disentanglement [9.043741281011304]
We propose a novel method to address two key issues: (a) Can we simultaneously learn fair disentangled representations while ensuring the utility of the learned representation for downstream tasks, and (b)Can we provide theoretical insights into when the proposed approach will be both fair and accurate.
To address the former, we propose the method FRIED, Fair Representation learning using Interpolation Enabled Disentanglement.
arXiv Detail & Related papers (2021-07-31T17:32:12Z) - Turning Tables: Generating Examples from Semi-structured Tables for
Endowing Language Models with Reasoning Skills [32.55545292360155]
We propose to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs.
We add a pre-training step over this synthetic data, which includes examples that require 16 different reasoning skills.
We show that our model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
arXiv Detail & Related papers (2021-07-15T11:37:14Z) - Function Contrastive Learning of Transferable Meta-Representations [38.31692245188669]
We study the implications of joint training on the transferability of the meta-representations.
We propose a decoupled encoder-decoder approach to supervised meta-learning.
arXiv Detail & Related papers (2020-10-14T13:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.