Related papers: Time is Encoded in the Weights of Finetuned Language Models

Time is Encoded in the Weights of Finetuned Language Models

URL: http://arxiv.org/abs/2312.13401v2
Date: Sat, 30 Dec 2023 22:11:07 GMT
Title: Time is Encoded in the Weights of Finetuned Language Models
Authors: Kai Nylund, Suchin Gururangan, Noah A. Smith
Abstract summary: We present time vectors, a simple tool to customize language models to new time periods. Time vectors are created by finetuning a language model on data from a single time. This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period.
Score: 65.71926562424795
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present time vectors, a simple tool to customize language models to new time periods. Time vectors are created by finetuning a language model on data from a single time (e.g., a year or month), and then subtracting the weights of the original pretrained model. This vector specifies a direction in weight space that, as our experiments show, improves performance on text from that time period. Time vectors specialized to adjacent time periods appear to be positioned closer together in a manifold. Using this structure, we interpolate between time vectors to induce new models that perform better on intervening and future time periods, without any additional training. We demonstrate the consistency of our findings across different tasks, domains, model sizes, and time scales. Our results suggest that time is encoded in the weight space of finetuned models.

Related papers

Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) can be plugged into any existing numerical-only time series models.<n>We show that TaTS can enhance predictive performance without modifying model architectures.
arXiv Detail & Related papers (2025-02-13T03:43:27Z)
Time Machine GPT [15.661920010658626]
Large language models (LLMs) are often trained on extensive, temporally indiscriminate text corpora. This approach is not aligned with the evolving nature of language. This paper presents a new approach: a series of point-in-time LLMs called Time Machine GPT (TiMaGPT)
arXiv Detail & Related papers (2024-04-29T09:34:25Z)
Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models. We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z)
PDETime: Rethinking Long-Term Multivariate Time Series Forecasting from the perspective of partial differential equations [49.80959046861793]
We present PDETime, a novel LMTF model inspired by the principles of Neural PDE solvers. Our experimentation across seven diversetemporal real-world LMTF datasets reveals that PDETime adapts effectively to the intrinsic nature of the data.
arXiv Detail & Related papers (2024-02-25T17:39:44Z)
A decoder-only foundation model for time-series forecasting [23.824504640087753]
Our model is based on pretraining a patched-decoder style attention model on a large time-series corpus. It can work well across different forecasting history lengths, prediction lengths and temporal granularities.
arXiv Detail & Related papers (2023-10-14T17:01:37Z)
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems. We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting. Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z)
Learning Gaussian Mixture Representations for Tensor Time Series Forecasting [8.31607451942671]
We develop a novel TTS forecasting framework, which seeks to individually model each heterogeneity component implied in the time, the location, and the source variables. Experiment results on two real-world TTS datasets verify the superiority of our approach compared with the state-of-the-art baselines.
arXiv Detail & Related papers (2023-06-01T06:50:47Z)
Extracting Latent Steering Vectors from Pretrained Language Models [14.77762401765532]
We show that latent vectors can be extracted directly from language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly. We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark.
arXiv Detail & Related papers (2022-05-10T19:04:37Z)
Temporal Attention for Language Models [24.34396762188068]
We extend the key component of the transformer architecture, i.e., the self-attention mechanism, and propose temporal attention. temporal attention can be applied to any transformer model and requires the input texts to be accompanied with their relevant time points. We leverage these representations for the task of semantic change detection. Our proposed model achieves state-of-the-art results on all the datasets.
arXiv Detail & Related papers (2022-02-04T11:55:34Z)
Conditional Neural Relational Inference for Interacting Systems [58.141087282927415]
We learn to model the dynamics of similar yet distinct groups of interacting objects. We develop a model that allows us to do conditional generation from any such group given its vectorial description. We evaluate our model in the setting of modeling human gait and, in particular pathological human gait.
arXiv Detail & Related papers (2021-06-21T13:05:48Z)
Multi-timescale Representation Learning in LSTM Language Models [69.98840820213937]
Language models must capture statistical dependencies between words at timescales ranging from very short to very long. We derived a theory for how the memory gating mechanism in long short-term memory language models can capture power law decay. Experiments showed that LSTM language models trained on natural English text learn to approximate this theoretical distribution.
arXiv Detail & Related papers (2020-09-27T02:13:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.