Towards Understanding Large-Scale Discourse Structures in Pre-Trained
and Fine-Tuned Language Models
- URL: http://arxiv.org/abs/2204.04289v1
- Date: Fri, 8 Apr 2022 20:42:08 GMT
- Title: Towards Understanding Large-Scale Discourse Structures in Pre-Trained
and Fine-Tuned Language Models
- Authors: Patrick Huber and Giuseppe Carenini
- Abstract summary: We describe a novel approach to infer discourse structures from arbitrarily long documents.
Second, we propose a new type of analysis to explore where and how accurately intrinsic discourse is captured in the BERT and BART models.
We assess how similar the generated structures are to a variety of baselines as well as their distribution within and between models.
- Score: 30.615883375573432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With a growing number of BERTology work analyzing different components of
pre-trained language models, we extend this line of research through an
in-depth analysis of discourse information in pre-trained and fine-tuned
language models. We move beyond prior work along three dimensions: First, we
describe a novel approach to infer discourse structures from arbitrarily long
documents. Second, we propose a new type of analysis to explore where and how
accurately intrinsic discourse is captured in the BERT and BART models.
Finally, we assess how similar the generated structures are to a variety of
baselines as well as their distribution within and between models.
Related papers
- A Comparative Analysis of Conversational Large Language Models in
Knowledge-Based Text Generation [5.661396828160973]
We conduct an empirical analysis of conversational large language models in generating natural language text from semantic triples.
We compare four large language models of varying sizes with different prompting techniques.
Our findings show that the capabilities of large language models in triple verbalization can be significantly improved through few-shot prompting, post-processing, and efficient fine-tuning techniques.
arXiv Detail & Related papers (2024-02-02T15:26:39Z) - Mixture-of-Linguistic-Experts Adapters for Improving and Interpreting
Pre-trained Language Models [22.977852629450346]
We propose a method that combines two popular research areas by injecting linguistic structures into pre-trained language models.
In our approach, parallel adapter modules encoding different linguistic structures are combined using a novel Mixture-of-Linguistic-Experts architecture.
Our experiment results show that our approach can outperform state-of-the-art PEFT methods with a comparable number of parameters.
arXiv Detail & Related papers (2023-10-24T23:29:06Z) - Construction Grammar and Language Models [4.171555557592296]
Recent progress in deep learning has given rise to powerful models that are primarily trained on a cloze-like task.
This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar.
arXiv Detail & Related papers (2023-08-25T11:37:56Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - DeepStruct: Pretraining of Language Models for Structure Prediction [64.84144849119554]
We pretrain language models on a collection of task-agnostic corpora to generate structures from text.
Our structure pretraining enables zero-shot transfer of the learned knowledge that models have about the structure tasks.
We show that a 10B parameter language model transfers non-trivially to most tasks and obtains state-of-the-art performance on 21 of 28 datasets.
arXiv Detail & Related papers (2022-05-21T00:58:22Z) - Probing Structured Pruning on Multilingual Pre-trained Models: Settings,
Algorithms, and Efficiency [62.0887259003594]
This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency.
Experiments on nine downstream tasks show several counter-intuitive phenomena.
We present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference.
arXiv Detail & Related papers (2022-04-06T06:29:52Z) - Interpreting Language Models Through Knowledge Graph Extraction [42.97929497661778]
We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
arXiv Detail & Related papers (2021-11-16T15:18:01Z) - Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text.
Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences [18.19093600136057]
We propose a framework for extracting divergence patterns for any language pair from a parallel corpus.
We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation.
arXiv Detail & Related papers (2020-05-07T13:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.