BiTimeBERT: Extending Pre-Trained Language Representations with
Bi-Temporal Information
- URL: http://arxiv.org/abs/2204.13032v4
- Date: Thu, 27 Apr 2023 07:41:58 GMT
- Title: BiTimeBERT: Extending Pre-Trained Language Representations with
Bi-Temporal Information
- Authors: Jiexin Wang, Adam Jatowt, Masatoshi Yoshikawa, Yi Cai
- Abstract summary: We introduce BiTimeBERT, a novel language representation model trained on a temporal collection of news articles.
The experimental results show that BiTimeBERT consistently outperforms BERT and other existing pre-trained models.
- Score: 41.683057041628125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time is an important aspect of documents and is used in a range of NLP and IR
tasks. In this work, we investigate methods for incorporating temporal
information during pre-training to further improve the performance on
time-related tasks. Compared with common pre-trained language models like BERT
which utilize synchronic document collections (e.g., BookCorpus and Wikipedia)
as the training corpora, we use long-span temporal news article collection for
building word representations. We introduce BiTimeBERT, a novel language
representation model trained on a temporal collection of news articles via two
new pre-training tasks, which harnesses two distinct temporal signals to
construct time-aware language representations. The experimental results show
that BiTimeBERT consistently outperforms BERT and other existing pre-trained
models with substantial gains on different downstream NLP tasks and
applications for which time is of importance (e.g., the accuracy improvement
over BERT is 155\% on the event time estimation task).
Related papers
- Towards Effective Time-Aware Language Representation: Exploring Enhanced Temporal Understanding in Language Models [24.784375155633427]
BiTimeBERT 2.0 is a novel language model pre-trained on a temporal news article collection.
Each objective targets a unique aspect of temporal information.
Results consistently demonstrate that BiTimeBERT 2.0 outperforms models like BERT and other existing pre-trained models.
arXiv Detail & Related papers (2024-06-04T00:30:37Z) - A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision [74.972172804514]
We introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.
New dataset annotations provide continuous sign-level annotations for six hours of test videos, and will be made publicly available.
Our model significantly outperforms the previous state of the art on both tasks.
arXiv Detail & Related papers (2024-05-16T17:19:06Z) - Subspace Chronicles: How Linguistic Information Emerges, Shifts and
Interacts during Language Model Training [56.74440457571821]
We analyze tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds.
We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize.
Our findings have implications for model interpretability, multi-task learning, and learning from limited data.
arXiv Detail & Related papers (2023-10-25T09:09:55Z) - Pre-trained Language Model with Prompts for Temporal Knowledge Graph
Completion [30.50032335014021]
We propose a novel TKGC model, namely Pre-trained Language Model with Prompts for TKGC (PPT)
We convert a series of sampled quadruples into pre-trained language model inputs and convert intervals between timestamps into different prompts to make coherent sentences with implicit semantic information.
Our model can effectively incorporate information from temporal knowledge graphs into the language models.
arXiv Detail & Related papers (2023-05-13T12:53:11Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - ORCA: Interpreting Prompted Language Models via Locating Supporting Data
Evidence in the Ocean of Pretraining Data [38.20984369410193]
Large pretrained language models have been performing increasingly well in a variety of downstream tasks via prompting.
It remains unclear from where the model learns the task-specific knowledge, especially in a zero-shot setup.
In this work, we want to find evidence of the model's task-specific competence from pretraining and are specifically interested in locating a very small subset of pretraining data.
arXiv Detail & Related papers (2022-05-25T09:25:06Z) - Interpreting Language Models Through Knowledge Graph Extraction [42.97929497661778]
We compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process.
We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements.
We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base, RoBERTa)
arXiv Detail & Related papers (2021-11-16T15:18:01Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Temporally Correlated Task Scheduling for Sequence Learning [143.70523777803723]
In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks.
We introduce a learnable scheduler to sequence learning, which can adaptively select auxiliary tasks for training.
Our method significantly improves the performance of simultaneous machine translation and stock trend forecasting.
arXiv Detail & Related papers (2020-07-10T10:28:54Z) - Severing the Edge Between Before and After: Neural Architectures for
Temporal Ordering of Events [41.35277143634441]
We propose a neural architecture and a set of training methods for ordering events by predicting temporal relations.
Given that a key challenge with this task is the scarcity of annotated data, our models rely on either pretrained representations or transfer and multi-task learning.
Experiments on the MATRES dataset of English documents establish a new state-of-the-art on this task.
arXiv Detail & Related papers (2020-04-08T23:17:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.