Time is Encoded in the Weights of Finetuned Language Models
- URL: http://arxiv.org/abs/2312.13401v2
- Date: Sat, 30 Dec 2023 22:11:07 GMT
- Title: Time is Encoded in the Weights of Finetuned Language Models
- Authors: Kai Nylund, Suchin Gururangan, Noah A. Smith
- Abstract summary: We present time vectors, a simple tool to customize language models to new time periods.
Time vectors are created by finetuning a language model on data from a single time.
This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period.
- Score: 65.71926562424795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present time vectors, a simple tool to customize language models to new
time periods. Time vectors are created by finetuning a language model on data
from a single time (e.g., a year or month), and then subtracting the weights of
the original pretrained model. This vector specifies a direction in weight
space that, as our experiments show, improves performance on text from that
time period. Time vectors specialized to adjacent time periods appear to be
positioned closer together in a manifold. Using this structure, we interpolate
between time vectors to induce new models that perform better on intervening
and future time periods, without any additional training. We demonstrate the
consistency of our findings across different tasks, domains, model sizes, and
time scales. Our results suggest that time is encoded in the weight space of
finetuned models.
Related papers
- Time Machine GPT [15.661920010658626]
Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora.
This approach is not aligned with the evolving nature of language.
This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT)
arXiv Detail & Related papers (2024-04-29T09:34:25Z) - Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models.
We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z) - PDETime: Rethinking Long-Term Multivariate Time Series Forecasting from
the perspective of partial differential equations [49.80959046861793]
We present PDETime, a novel LMTF model inspired by the principles of Neural PDE solvers.
Our experimentation across seven diversetemporal real-world LMTF datasets reveals that PDETime adapts effectively to the intrinsic nature of the data.
arXiv Detail & Related papers (2024-02-25T17:39:44Z) - A decoder-only foundation model for time-series forecasting [23.824504640087753]
Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus.
It can work well across different forecasting history lengths, prediction lengths and temporal granularities.
arXiv Detail & Related papers (2023-10-14T17:01:37Z) - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems.
We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting.
Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z) - Learning Gaussian Mixture Representations for Tensor Time Series
Forecasting [8.31607451942671]
We develop a novel TTS forecasting framework, which seeks to individually model each heterogeneity component implied in the time, the location, and the source variables.
Experiment results on two real-world TTS datasets verify the superiority of our approach compared with the state-of-the-art baselines.
arXiv Detail & Related papers (2023-06-01T06:50:47Z) - Extracting Latent Steering Vectors from Pretrained Language Models [14.77762401765532]
We show that latent vectors can be extracted directly from language model decoders without fine-tuning.
Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly.
We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark.
arXiv Detail & Related papers (2022-05-10T19:04:37Z) - Temporal Attention for Language Models [24.34396762188068]
We extend the key component of the transformer architecture, i.e., the self-attention mechanism, and propose temporal attention.
temporal attention can be applied to any transformer model and requires the input texts to be accompanied with their relevant time points.
We leverage these representations for the task of semantic change detection.
Our proposed model achieves state-of-the-art results on all the datasets.
arXiv Detail & Related papers (2022-02-04T11:55:34Z) - Conditional Neural Relational Inference for Interacting Systems [58.141087282927415]
We learn to model the dynamics of similar yet distinct groups of interacting objects.
We develop a model that allows us to do conditional generation from any such group given its vectorial description.
We evaluate our model in the setting of modeling human gait and, in particular pathological human gait.
arXiv Detail & Related papers (2021-06-21T13:05:48Z) - Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long.
We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay.
Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.