MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
- URL: http://arxiv.org/abs/2501.08837v1
- Date: Wed, 15 Jan 2025 14:46:44 GMT
- Title: MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation
- Authors: Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Juergen Gall,
- Abstract summary: This paper addresses the problem of long-term dense anticipation.
The goal of this task is to predict actions and their durations several minutes into the future based on provided video observations.
To address this uncertainty, models are designed to predict several potential future action sequences.
- Score: 17.4088244981231
- License:
- Abstract: Our work addresses the problem of stochastic long-term dense anticipation. The goal of this task is to predict actions and their durations several minutes into the future based on provided video observations. Anticipation over extended horizons introduces high uncertainty, as a single observation can lead to multiple plausible future outcomes. To address this uncertainty, stochastic models are designed to predict several potential future action sequences. Recent work has further proposed to incorporate uncertainty modelling for observed frames by simultaneously predicting per-frame past and future actions in a unified manner. While such joint modelling of actions is beneficial, it requires long-range temporal capabilities to connect events across distant past and future time points. However, the previous work struggles to achieve such a long-range understanding due to its limited and/or sparse receptive field. To alleviate this issue, we propose a novel MANTA (MAmba for ANTicipation) network. Our model enables effective long-term temporal modelling even for very long sequences while maintaining linear complexity in sequence length. We demonstrate that our approach achieves state-of-the-art results on three datasets - Breakfast, 50Salads, and Assembly101 - while also significantly improving computational and memory efficiency.
Related papers
- Breaking the Context Bottleneck on Long Time Series Forecasting [6.36010639533526]
Long-term time-series forecasting is essential for planning and decision-making in economics, energy, and transportation.
Recent advancements have enhanced the efficiency of these models, but the challenge of effectively leveraging longer sequences persists.
We propose the Logsparse Decomposable Multiscaling (LDM) framework for the efficient and effective processing of long sequences.
arXiv Detail & Related papers (2024-12-21T10:29:34Z) - TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting [49.6208017412376]
TimeBridge is a novel framework designed to bridge the gap between non-stationarity and dependency modeling.
TimeBridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting.
arXiv Detail & Related papers (2024-10-06T10:41:03Z) - Multiscale Representation Enhanced Temporal Flow Fusion Model for Long-Term Workload Forecasting [19.426131129034115]
This paper proposes a novel framework leveraging self-supervised multiscale representation learning to capture both long-term and near-term workload patterns.
The long-term history is encoded through multiscale representations while the near-term observations are modeled via temporal flow fusion.
arXiv Detail & Related papers (2024-07-29T04:42:18Z) - Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation [17.4088244981231]
Long-term action anticipation has become an important task for many applications such as autonomous driving and human-robot interaction.
We propose a novel Gated Temporal Diffusion (GTD) network that models the uncertainty of both the observation and the future predictions.
Our model achieves state-of-the-art results on the Breakfast, Assembly101 and 50Salads datasets in both deterministic settings.
arXiv Detail & Related papers (2024-07-16T17:48:05Z) - Self-Supervised Contrastive Learning for Long-term Forecasting [41.11757636744812]
Long-term forecasting presents unique challenges due to the time and memory complexity.
Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations.
We introduce a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture.
arXiv Detail & Related papers (2024-02-03T04:32:34Z) - Performative Time-Series Forecasting [71.18553214204978]
We formalize performative time-series forecasting (PeTS) from a machine-learning perspective.
We propose a novel approach, Feature Performative-Shifting (FPS), which leverages the concept of delayed response to anticipate distribution shifts.
We conduct comprehensive experiments using multiple time-series models on COVID-19 and traffic forecasting tasks.
arXiv Detail & Related papers (2023-10-09T18:34:29Z) - Generative Time Series Forecasting with Diffusion, Denoise, and
Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications.
It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series.
We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z) - Finding Islands of Predictability in Action Forecasting [7.215559809521136]
We show that future action sequences are more accurately modeled with variable, rather than one, levels of abstraction.
We propose a combination Bayesian neural network and hierarchical convolutional segmentation model to both accurately predict future actions and optimally select abstraction levels.
arXiv Detail & Related papers (2022-10-13T21:01:16Z) - Long Term Motion Prediction Using Keyposes [122.22758311506588]
We argue that, to achieve long term forecasting, predicting human pose at every time instant is unnecessary.
We call such poses "keyposes", and approximate complex motions by linearly interpolating between subsequent keyposes.
We show that learning the sequence of such keyposes allows us to predict very long term motion, up to 5 seconds in the future.
arXiv Detail & Related papers (2020-12-08T20:45:51Z) - From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting [54.273455592965355]
Uncertainty in future trajectories stems from two sources: (a) sources known to the agent but unknown to the model, such as long term goals and (b)sources that are unknown to both the agent & the model, such as intent of other agents & irreducible randomness indecisions.
We model the epistemic un-certainty through multimodality in long term goals and the aleatoric uncertainty through multimodality in waypoints& paths.
To exemplify this dichotomy, we also propose a novel long term trajectory forecasting setting, with prediction horizons upto a minute, an order of magnitude longer than prior works.
arXiv Detail & Related papers (2020-12-02T21:01:29Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.