MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
- URL: http://arxiv.org/abs/2406.07191v1
- Date: Tue, 11 Jun 2024 12:03:57 GMT
- Title: MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
- Authors: Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos,
- Abstract summary: This paper is on long-term video understanding where the goal is to recognise human actions over long temporal windows (up to minutes long)
We propose an alternative to attention-based schemes which is based on a low-rank approximation of the memory obtained using Singular Value Decomposition.
Our scheme has two advantages: (a) it reduces complexity by more than an order of magnitude, and (b) it is amenable to an efficient implementation for the calculation of the memory bases.
- Score: 27.472705540825316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper is on long-term video understanding where the goal is to recognise human actions over long temporal windows (up to minutes long). In prior work, long temporal context is captured by constructing a long-term memory bank consisting of past and future video features which are then integrated into standard (short-term) video recognition backbones through the use of attention mechanisms. Two well-known problems related to this approach are the quadratic complexity of the attention operation and the fact that the whole feature bank must be stored in memory for inference. To address both issues, we propose an alternative to attention-based schemes which is based on a low-rank approximation of the memory obtained using Singular Value Decomposition. Our scheme has two advantages: (a) it reduces complexity by more than an order of magnitude, and (b) it is amenable to an efficient implementation for the calculation of the memory bases in an incremental fashion which does not require the storage of the whole feature bank in memory. The proposed scheme matches or surpasses the accuracy achieved by attention-based mechanisms while being memory-efficient. Through extensive experiments, we demonstrate that our framework generalises to different architectures and tasks, outperforming the state-of-the-art in three datasets.
Related papers
- Cottention: Linear Transformers With Cosine Attention [2.762180345826837]
We introduce Cottention, a novel attention mechanism that replaces the softmax operation with cosine similarity.
Cottention achieves native linear memory complexity with respect to sequence length, making it inherently more memory-efficient than softmax attention.
arXiv Detail & Related papers (2024-09-27T13:38:36Z) - TF-SASM: Training-free Spatial-aware Sparse Memory for Multi-object Tracking [6.91631684487121]
Multi-object tracking (MOT) in computer vision remains a significant challenge, requiring precise localization and continuous tracking of multiple objects in video sequences.
We propose a novel memory-based approach that selectively stores critical features based on object motion and overlapping awareness.
Our approach significantly improves over MOTRv2 in the DanceTrack test set, demonstrating a gain of 2.0% AssA score and 2.1% in IDF1 score.
arXiv Detail & Related papers (2024-07-05T07:55:19Z) - MAMBA: Multi-level Aggregation via Memory Bank for Video Object
Detection [35.16197118579414]
We propose a multi-level aggregation architecture via memory bank called MAMBA.
Specifically, our memory bank employs two novel operations to eliminate the disadvantages of existing methods.
Compared with existing state-of-the-art methods, our method achieves superior performance in terms of both speed and accuracy.
arXiv Detail & Related papers (2024-01-18T12:13:06Z) - Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation.
We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z) - Memory-Guided Semantic Learning Network for Temporal Sentence Grounding [55.31041933103645]
We propose a memory-augmented network that learns and memorizes the rarely appeared content in TSG tasks.
MGSL-Net consists of three main parts: a cross-modal inter-action module, a memory augmentation module, and a heterogeneous attention module.
arXiv Detail & Related papers (2022-01-03T02:32:06Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.