Related papers: Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models

Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models

URL: http://arxiv.org/abs/2510.22752v1
Date: Sun, 26 Oct 2025 17:01:41 GMT
Title: Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models
Authors: Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Zoran Tiganj,
Abstract summary: In-context learning is governed by both temporal and semantic relationships.<n>This work probes the ability of various pretrained Large Language Models (LLMs) to differentiate and retrieve temporally separated events.<n>Our findings deepen the understanding of temporal biases in in-context learning and offer an illustration of how these biases can enable temporal separation and episodic retrieval.
Score: 4.69761138328817
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In-context learning is governed by both temporal and semantic relationships, shaping how Large Language Models (LLMs) retrieve contextual information. Analogous to human episodic memory, where the retrieval of specific events is enabled by separating events that happened at different times, this work probes the ability of various pretrained LLMs, including transformer and state-space models, to differentiate and retrieve temporally separated events. Specifically, we prompted models with sequences containing multiple presentations of the same token, which reappears at the sequence end. By fixing the positions of these repeated tokens and permuting all others, we removed semantic confounds and isolated temporal effects on next-token prediction. Across diverse sequences, models consistently placed the highest probabilities on tokens following a repeated token, but with a notable bias for those nearest the beginning or end of the input. An ablation experiment linked this phenomenon in transformers to induction heads. Extending the analysis to unique semantic contexts with partial overlap further demonstrated that memories embedded in the middle of a prompt are retrieved less reliably. Despite architectural differences, state-space and transformer models showed comparable temporal biases. Our findings deepen the understanding of temporal biases in in-context learning and offer an illustration of how these biases can enable temporal separation and episodic retrieval.

Related papers

Not in Sync: Unveiling Temporal Bias in Audio Chat Models [59.146710538620816]
Large Audio Language Models (LALMs) are increasingly applied to audio understanding and multimodal reasoning.<n>We present the first systematic study of temporal bias in LALMs, revealing a key limitation in their timestamp prediction.
arXiv Detail & Related papers (2025-10-14T06:29:40Z)
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling [67.02157180089573]
Time series pre-training has recently garnered wide attention for its potential to reduce labeling expenses and benefit various downstream tasks. This paper proposes TimeSiam as a simple but effective self-supervised pre-training framework for Time series based on Siamese networks.
arXiv Detail & Related papers (2024-02-04T13:10:51Z)
Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning. We consider a setting where the pretraining corpus consists of multitask demonstrations. We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z)
Generic Temporal Reasoning with Differential Analysis and Explanation [61.96034987217583]
We introduce a novel task named TODAY that bridges the gap with temporal differential analysis. TODAY evaluates whether systems can correctly understand the effect of incremental changes. We show that TODAY's supervision style and explanation annotations can be used in joint learning.
arXiv Detail & Related papers (2022-12-20T17:40:03Z)
Learning Temporal Rules from Noisy Timeseries Data [72.93572292157593]
We focus on uncovering the underlying atomic events and their relations that lead to the composite events within a noisy temporal data setting. We propose a Neural Temporal Logic Programming (Neural TLP) which first learns implicit temporal relations between atomic events and then lifts logic rules for supervision.
arXiv Detail & Related papers (2022-02-11T01:29:02Z)
An Empirical Study: Extensive Deep Temporal Point Process [61.14164208094238]
We first review recent research emphasis and difficulties in modeling asynchronous event sequences with deep temporal point process.<n>We propose a Granger causality discovery framework for exploiting the relations among multi-types of events.
arXiv Detail & Related papers (2021-10-19T10:15:00Z)
A new harmonium for pattern recognition in survival data [0.0]
Methods: An energy-based approach is taken with a bi-partite structure between latent and visible states, commonly known as harmoniums. We demonstrate that discriminative predictions improve by leveraging an extra time-to-event variable.
arXiv Detail & Related papers (2021-10-05T11:42:36Z)
Long-Range Transformers for Dynamic Spatiotemporal Forecasting [16.37467119526305]
Methods based on graph neural networks explicitly model variable relationships. Long-Range Transformers can learn interactions between time, value, and information jointly along this extended sequence.
arXiv Detail & Related papers (2021-09-24T22:11:46Z)
Extracting Event Temporal Relations via Hyperbolic Geometry [18.068466562913923]
We introduce two approaches to encode events and their temporal relations in hyperbolic spaces. One approach leverages hyperbolic embeddings to directly infer event relations through simple geometrical operations. In the second one, we devise an end-to-end architecture composed of hyperbolic neural units tailored for the temporal relation extraction task.
arXiv Detail & Related papers (2021-09-12T14:40:13Z)
Pay Attention to Evolution: Time Series Forecasting with Deep Graph-Evolution Learning [33.79957892029931]
This work presents a novel neural network architecture for time-series forecasting. We named our method Recurrent Graph Evolution Neural Network (ReGENN) An extensive set of experiments was conducted comparing ReGENN with dozens of ensemble methods and classical statistical ones.
arXiv Detail & Related papers (2020-08-28T20:10:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.