BERT got a Date: Introducing Transformers to Temporal Tagging
- URL: http://arxiv.org/abs/2109.14927v2
- Date: Mon, 4 Oct 2021 08:25:36 GMT
- Title: BERT got a Date: Introducing Transformers to Temporal Tagging
- Authors: Satya Almasian, Dennis Aumiller, Michael Gertz
- Abstract summary: We present a transformer encoder-decoder model using the RoBERTa language model as our best performing system.
Our model surpasses previous works in temporal tagging and type classification, especially on rare classes.
- Score: 4.651578365545765
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Temporal expressions in text play a significant role in language
understanding and correctly identifying them is fundamental to various
retrieval and natural language processing systems. Previous works have slowly
shifted from rule-based to neural architectures, capable of tagging expressions
with higher accuracy. However, neural models can not yet distinguish between
different expression types at the same level as their rule-based counterparts.
In this work, we aim to identify the most suitable transformer architecture for
joint temporal tagging and type classification, as well as, investigating the
effect of semi-supervised training on the performance of these systems. Based
on our study of token classification variants and encoder-decoder
architectures, we present a transformer encoder-decoder model using the RoBERTa
language model as our best performing system. By supplementing training
resources with weakly labeled data from rule-based systems, our model surpasses
previous works in temporal tagging and type classification, especially on rare
classes. Our code and pre-trained experiments are available at:
https://github.com/satya77/Transformer_Temporal_Tagger
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - Encoding Agent Trajectories as Representations with Sequence Transformers [0.4999814847776097]
We propose a model for representing high dimensional trajectories with neural-based network architecture.
Similar to language models, our Transformer Sequence for Agent temporal Representations (STARE) model can learn representations and structure in trajectory data.
We present experimental results on various synthetic and real trajectory datasets and show that our proposed model can learn meaningful encodings.
arXiv Detail & Related papers (2024-10-11T19:18:47Z) - Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized.
We find that these random transformers can perform a wide range of meaningful algorithmic tasks.
Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z) - Pre-Training a Graph Recurrent Network for Language Representation [34.4554387894105]
We consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications.
We find that our model can generate more diverse outputs with less contextualized feature redundancy than existing attention-based models.
arXiv Detail & Related papers (2022-09-08T14:12:15Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Shifted Chunk Transformer for Spatio-Temporal Representational Learning [24.361059477031162]
We construct a shifted chunk Transformer with pure self-attention blocks.
This Transformer can learn hierarchical-temporal features from a tiny patch to a global video clip.
It outperforms state-of-the-art approaches on Kinetics, Kinetics-600, UCF101, and HMDB51.
arXiv Detail & Related papers (2021-08-26T04:34:33Z) - Learning to Look Inside: Augmenting Token-Based Encoders with
Character-Level Information [29.633735942273997]
XRayEmb is a method for retrofitting existing token-based models with character-level information.
We show that incorporating XRayEmb's learned vectors into sequences of pre-trained token embeddings helps performance on both autoregressive and masked pre-trained transformer architectures.
arXiv Detail & Related papers (2021-08-01T08:09:26Z) - TERA: Self-Supervised Learning of Transformer Encoder Representation for
Speech [63.03318307254081]
TERA stands for Transformer Representations from Alteration.
We use alteration along three axes to pre-train Transformers on a large amount of unlabeled speech.
TERA can be used for speech representations extraction or fine-tuning with downstream models.
arXiv Detail & Related papers (2020-07-12T16:19:00Z) - Segatron: Segment-Aware Transformer for Language Modeling and
Understanding [79.84562707201323]
We propose a segment-aware Transformer (Segatron) to generate better contextual representations from sequential tokens.
We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model.
We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset.
arXiv Detail & Related papers (2020-04-30T17:38:27Z) - Fixed Encoder Self-Attention Patterns in Transformer-Based Machine
Translation [73.11214377092121]
We propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns.
Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality.
arXiv Detail & Related papers (2020-02-24T13:53:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.