Enhancing Masked Time-Series Modeling via Dropping Patches
- URL: http://arxiv.org/abs/2412.15315v1
- Date: Thu, 19 Dec 2024 17:21:34 GMT
- Title: Enhancing Masked Time-Series Modeling via Dropping Patches
- Authors: Tianyu Qiu, Yi Xie, Yun Xiong, Hao Niu, Xiaofeng Gao,
- Abstract summary: This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series.
The method named DropPatch is proposed, which improves the pre-training efficiency by a square-level advantage.
It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start.
- Score: 10.715930488118582
- License:
- Abstract: This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start. This paper conducts comprehensive experiments to verify the effectiveness of the method and analyze its internal mechanism. Empirically, DropPatch strengthens the attention mechanism, reduces information redundancy and serves as an efficient means of data augmentation. Theoretically, it is proved that DropPatch slows down the rate at which the Transformer representations collapse into the rank-1 linear subspace by randomly dropping patches, thus optimizing the quality of the learned representations
Related papers
- xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition [21.919661430250798]
We develop a novel dual-stream architecture that utilizes exponential decomposition.
We develop a robust arctangent loss function and a sigmoid learning rate adjustment scheme, which prevent overfitting and boost forecasting performance.
arXiv Detail & Related papers (2024-12-23T06:32:59Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Inducing Semi-Structured Sparsity by Masking for Efficient Model Inference in Convolutional Networks [0.0]
This paper proposes a novel method to learn semi-structured sparsity patterns for convolution kernels in the form of maskings.
The approach accelerates convolutional models more than two-fold during inference without decreasing model performance.
arXiv Detail & Related papers (2024-11-01T00:53:33Z) - Efficient Diffusion as Low Light Enhancer [63.789138528062225]
Reflectance-Aware Trajectory Refinement (RATR) is a simple yet effective module to refine the teacher trajectory using the reflectance component of images.
textbfReflectance-aware textbfDiffusion with textbfDistilled textbfTrajectory (textbfReDDiT) is an efficient and flexible distillation framework tailored for Low-Light Image Enhancement (LLIE)
arXiv Detail & Related papers (2024-10-16T08:07:18Z) - TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation [47.58016750718323]
We propose TimeDART, a novel self-supervised time series pre-training framework.
TimeDART unifies two powerful generative paradigms to learn more transferable representations.
We conduct extensive experiments on public datasets for time series forecasting and classification.
arXiv Detail & Related papers (2024-10-08T06:08:33Z) - QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning [52.157939524815866]
In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods.
We identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width.
Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings.
arXiv Detail & Related papers (2024-02-06T03:39:44Z) - Learning to Embed Time Series Patches Independently [5.752266579415516]
Masked time series modeling has recently gained much attention as a self-supervised representation learning strategy for time series.
We argue that capturing such patch might not be an optimal strategy for time series representation learning.
We propose to use 1) the simple patch reconstruction task, which autoencode each patch without looking at other patches, and 2) the simple patch-wise reconstruction that embeds each patch independently.
arXiv Detail & Related papers (2023-12-27T06:23:29Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - RealPatch: A Statistical Matching Framework for Model Patching with Real
Samples [6.245453620070586]
RealPatch is a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching.
We show that RealPatch can successfully eliminate dataset leakage while reducing model leakage and maintaining high utility.
arXiv Detail & Related papers (2022-08-03T16:22:30Z) - Patch Slimming for Efficient Vision Transformers [107.21146699082819]
We study the efficiency problem for visual transformers by excavating redundant calculation in given networks.
We present a novel patch slimming approach that discards useless patches in a top-down paradigm.
Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of vision transformers.
arXiv Detail & Related papers (2021-06-05T09:46:00Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.