Related papers: MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation

URL: http://arxiv.org/abs/2501.08837v2
Date: Fri, 21 Mar 2025 17:04:07 GMT
Title: MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
Authors: Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Juergen Gall,
Abstract summary: Long-term dense action anticipation is challenging since it requires predicting actions and their durations several minutes into the future.<n>We propose a novel MANTA (MAmba for ANTicipation) network to enable effective long-term temporal modelling.<n>Our approach achieves state-of-the-art results on three datasets - Breakfast, 50Salads, and Assembly101.
Score: 17.4088244981231
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-term dense action anticipation is very challenging since it requires predicting actions and their durations several minutes into the future based on provided video observations. To model the uncertainty of future outcomes, stochastic models predict several potential future action sequences for the same observation. Recent work has further proposed to incorporate uncertainty modelling for observed frames by simultaneously predicting per-frame past and future actions in a unified manner. While such joint modelling of actions is beneficial, it requires long-range temporal capabilities to connect events across distant past and future time points. However, the previous work struggles to achieve such a long-range understanding due to its limited and/or sparse receptive field. To alleviate this issue, we propose a novel MANTA (MAmba for ANTicipation) network. Our model enables effective long-term temporal modelling even for very long sequences while maintaining linear complexity in sequence length. We demonstrate that our approach achieves state-of-the-art results on three datasets - Breakfast, 50Salads, and Assembly101 - while also significantly improving computational and memory efficiency. Our code is available at https://github.com/olga-zats/DIFF_MANTA .

Related papers

Breaking the Context Bottleneck on Long Time Series Forecasting [6.36010639533526]
Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation.<n>Recent advancements have enhanced the efficiency of these models, but the challenge of effectively leveraging longer sequences persists.<n>We propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences.
arXiv Detail & Related papers (2024-12-21T10:29:34Z)
TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting [49.6208017412376]
TimeBridge is a novel framework designed to bridge the gap between non-stationarity and dependency modeling. TimeBridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting.
arXiv Detail & Related papers (2024-10-06T10:41:03Z)
Oscillatory State-Space Models [61.923849241099184]
We propose Lineary State-Space models (LinOSS) for efficiently learning on long sequences. A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model. We show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions.
arXiv Detail & Related papers (2024-10-04T22:00:13Z)
Multiscale Representation Enhanced Temporal Flow Fusion Model for Long-Term Workload Forecasting [19.426131129034115]
This paper proposes a novel framework leveraging self-supervised multiscale representation learning to capture both long-term and near-term workload patterns. The long-term history is encoded through multiscale representations while the near-term observations are modeled via temporal flow fusion.
arXiv Detail & Related papers (2024-07-29T04:42:18Z)
Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction. We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions. Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z)
Self-Supervised Contrastive Learning for Long-term Forecasting [41.11757636744812]
Long-term forecasting presents unique challenges due to the time and memory complexity. Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations. We introduce a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture.
arXiv Detail & Related papers (2024-02-03T04:32:34Z)
Explainable Parallel RCNN with Novel Feature Representation for Time Series Forecasting [0.0]
Time series forecasting is a fundamental challenge in data science. We develop a parallel deep learning framework composed of RNN and CNN. Extensive experiments on three datasets reveal the effectiveness of our method.
arXiv Detail & Related papers (2023-05-08T17:20:13Z)
Generative Time Series Forecasting with Diffusion, Denoise, and Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications. It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series. We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z)
FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting [22.821606402558707]
We develop a textbfFrequency textbfimproved textbfLegendre textbfMemory model, or bf FiLM, to handle the dilemma between accurately preserving historical information and reducing the impact of noisy signals in the past. Our empirical studies show that the proposed FiLM improves the accuracy of state-of-the-art models by a significant margin.
arXiv Detail & Related papers (2022-05-18T12:37:54Z)
Long Term Motion Prediction Using Keyposes [122.22758311506588]
We argue that, to achieve long term forecasting, predicting human pose at every time instant is unnecessary. We call such poses "keyposes", and approximate complex motions by linearly interpolating between subsequent keyposes. We show that learning the sequence of such keyposes allows us to predict very long term motion, up to 5 seconds in the future.
arXiv Detail & Related papers (2020-12-08T20:45:51Z)
From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting [54.273455592965355]
Uncertainty in future trajectories stems from two sources: (a) sources known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions. We model the epistemic un-certainty through multimodality in long term goals and the aleatoric uncertainty through multimodality in waypoints& paths. To exemplify this dichotomy, we also propose a novel long term trajectory forecasting setting, with prediction horizons upto a minute, an order of magnitude longer than prior works.
arXiv Detail & Related papers (2020-12-02T21:01:29Z)
History Repeats Itself: Human Motion Prediction via Motion Attention [81.94175022575966]
We introduce an attention-based feed-forward network that explicitly leverages the observation that human motion tends to repeat itself. In particular, we propose to extract motion attention to capture the similarity between the current motion context and the historical motion sub-sequences. Our experiments on Human3.6M, AMASS and 3DPW evidence the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2020-07-23T02:12:27Z)
Ambiguity in Sequential Data: Predicting Uncertain Futures with Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data. We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.