When, How Long and How Much? Interpretable Neural Networks for Time Series Regression by Learning to Mask and Aggregate
- URL: http://arxiv.org/abs/2512.03578v1
- Date: Wed, 03 Dec 2025 09:01:41 GMT
- Title: When, How Long and How Much? Interpretable Neural Networks for Time Series Regression by Learning to Mask and Aggregate
- Authors: Florent Forest, Amaury Wei, Olga Fink,
- Abstract summary: Time series extrinsic regression (TSER) refers to the task of predicting a continuous target variable from an input time series.<n>New approach learns a compact set of human-understandable concepts without requiring any annotations.
- Score: 16.533105886716804
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Time series extrinsic regression (TSER) refers to the task of predicting a continuous target variable from an input time series. It appears in many domains, including healthcare, finance, environmental monitoring, and engineering. In these settings, accurate predictions and trustworthy reasoning are both essential. Although state-of-the-art TSER models achieve strong predictive performance, they typically operate as black boxes, making it difficult to understand which temporal patterns drive their decisions. Post-hoc interpretability techniques, such as feature attribution, aim to to explain how the model arrives at its predictions, but often produce coarse, noisy, or unstable explanations. Recently, inherently interpretable approaches based on concepts, additive decompositions, or symbolic regression, have emerged as promising alternatives. However, these approaches remain limited: they require explicit supervision on the concepts themselves, often cannot capture interactions between time-series features, lack expressiveness for complex temporal patterns, and struggle to scale to high-dimensional multivariate data. To address these limitations, we propose MAGNETS (Mask-and-AGgregate NEtwork for Time Series), an inherently interpretable neural architecture for TSER. MAGNETS learns a compact set of human-understandable concepts without requiring any annotations. Each concept corresponds to a learned, mask-based aggregation over selected input features, explicitly revealing both which features drive predictions and when they matter in the sequence. Predictions are formed as combinations of these learned concepts through a transparent, additive structure, enabling clear insight into the model's decision process.
Related papers
- Kelix Technical Report [86.64551727600104]
We present Kelix, a fully discrete autoregressive unified model that closes the understanding gap between discrete and continuous visual representations.<n>Recent work has explored discrete visual tokenization to enable fully autoregressive multimodal modeling.
arXiv Detail & Related papers (2026-02-10T14:48:26Z) - Interpretability in Deep Time Series Models Demands Semantic Alignment [19.12673689717747]
We state interpretability in deep time series models should pursue semantic alignment.<n>Once established, semantic alignment must be preserved under temporal evolution.<n>We outline a blueprint for semantically aligned deep time series models, identify properties that support trust, and discuss implications for model design.
arXiv Detail & Related papers (2026-02-02T15:48:30Z) - Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions [70.87254264798341]
PCI is a training-free and model-agnostic framework for analyzing concept dynamics through diffusion time.<n>It reveals diverse temporal behaviors across diffusion models, in which certain phases of the trajectory are more favorable to specific concepts even within the same concept type.
arXiv Detail & Related papers (2025-12-09T11:05:08Z) - Priors in Time: Missing Inductive Biases for Language Model Interpretability [58.07412640266836]
We show that Sparse Autoencoders impose priors that assume independence of concepts across time, implying stationarity.<n>We introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts.<n>Our results underscore the need for inductive biases that match the data in designing robust interpretability tools.
arXiv Detail & Related papers (2025-11-03T18:43:48Z) - TimesBERT: A BERT-Style Foundation Model for Time Series Understanding [72.64824086839631]
GPT-style models have been positioned as foundation models for time series forecasting.<n>BERT-style architecture has not been fully unlocked for time series understanding.<n>We design TimesBERT to learn generic representations of time series.<n>Our model is pre-trained on 260 billion time points across diverse domains.
arXiv Detail & Related papers (2025-02-28T17:14:44Z) - DCIts -- Deep Convolutional Interpreter for time series [0.0]
The model is designed so one can robustly determine the optimal window size that captures all necessary interactions within the smallest possible time frame.<n>It effectively identifies the optimal model order, balancing complexity when incorporating higher-order terms.<n>These advancements hold significant implications for modeling and understanding dynamic systems, making the model a valuable tool for applied and computational physicists.
arXiv Detail & Related papers (2025-01-08T08:21:58Z) - FocusLearn: Fully-Interpretable, High-Performance Modular Neural Networks for Time Series [0.3277163122167434]
This paper proposes a novel modular neural network model for time series prediction that is interpretable by construction.
A recurrent neural network learns the temporal dependencies in the data while an attention-based feature selection component selects the most relevant features.
A modular deep network is trained from the selected features independently to show the users how features influence outcomes, making the model interpretable.
arXiv Detail & Related papers (2023-11-28T14:51:06Z) - Contextualizing MLP-Mixers Spatiotemporally for Urban Data Forecast at Scale [54.15522908057831]
We propose an adapted version of the computationally-Mixer for STTD forecast at scale.
Our results surprisingly show that this simple-yeteffective solution can rival SOTA baselines when tested on several traffic benchmarks.
Our findings contribute to the exploration of simple-yet-effective models for real-world STTD forecasting.
arXiv Detail & Related papers (2023-07-04T05:19:19Z) - Uncovering the Missing Pattern: Unified Framework Towards Trajectory
Imputation and Prediction [60.60223171143206]
Trajectory prediction is a crucial undertaking in understanding entity movement or human behavior from observed sequences.
Current methods often assume that the observed sequences are complete while ignoring the potential for missing values.
This paper presents a unified framework, the Graph-based Conditional Variational Recurrent Neural Network (GC-VRNN), which can perform trajectory imputation and prediction simultaneously.
arXiv Detail & Related papers (2023-03-28T14:27:27Z) - Interpretable Feature Engineering for Time Series Predictors using
Attention Networks [6.617546606897785]
We use multi-head attention networks to develop interpretable features and use them to achieve good predictive performance.
The customized attention layer explicitly uses multiplicative interactions and builds feature-engineering heads that capture temporal dynamics in a parsimonious manner.
arXiv Detail & Related papers (2022-05-23T20:13:08Z) - timeXplain -- A Framework for Explaining the Predictions of Time Series
Classifiers [3.6433472230928428]
We present novel domain mappings for the time domain, frequency domain, and time series statistics.
We analyze their explicative power as well as their limits.
We employ a novel evaluation metric to experimentally compare timeXplain to several model-specific explanation approaches.
arXiv Detail & Related papers (2020-07-15T10:32:43Z) - Predicting Temporal Sets with Deep Neural Networks [50.53727580527024]
We propose an integrated solution based on the deep neural networks for temporal sets prediction.
A unique perspective is to learn element relationship by constructing set-level co-occurrence graph.
We design an attention-based module to adaptively learn the temporal dependency of elements and sets.
arXiv Detail & Related papers (2020-06-20T03:29:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.