Related papers: Optimizing a Transformer-based network for a deep learning seismic processing workflow

Optimizing a Transformer-based network for a deep learning seismic processing workflow

URL: http://arxiv.org/abs/2308.04739v1
Date: Wed, 9 Aug 2023 07:11:42 GMT
Title: Optimizing a Transformer-based network for a deep learning seismic processing workflow
Authors: Randy Harsuko and Tariq Alkhalifah
Abstract summary: StorSeismic is a recently introduced model based on the Transformer to adapt to various seismic processing tasks. We observe faster pretraining and competitive results on the fine-tuning tasks and, additionally, fewer parameters to train compared to the vanilla model.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: StorSeismic is a recently introduced model based on the Transformer to adapt to various seismic processing tasks through its pretraining and fine-tuning training strategy. In the original implementation, StorSeismic utilized a sinusoidal positional encoding and a conventional self-attention mechanism, both borrowed from the natural language processing (NLP) applications. For seismic processing they admitted good results, but also hinted to limitations in efficiency and expressiveness. We propose modifications to these two key components, by utilizing relative positional encoding and low-rank attention matrices as replacements to the vanilla ones. The proposed changes are tested on processing tasks applied to a realistic Marmousi and offshore field data as a sequential strategy, starting from denoising, direct arrival removal, multiple attenuation, and finally root-mean-squared velocity ($V_{RMS}$) prediction for normal moveout (NMO) correction. We observe faster pretraining and competitive results on the fine-tuning tasks and, additionally, fewer parameters to train compared to the vanilla model.

Related papers

LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning [39.56217775141507]
Low-rAnk Regulated Gradient Projection (LARGO) algorithm integrates dynamic constraints into low-rank adaptation methods.<n>LARGO achieves state-of-the-art performance across in-domain and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-06-14T08:19:11Z)
Transformer Meets Twicing: Harnessing Unattended Residual Information [2.1605931466490795]
Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks. While the self-attention mechanism has proven capable of handling complex data patterns, it has been observed that the representational capacity of the attention matrix degrades significantly across transformer layers. We propose the Twicing Attention, a novel attention mechanism that uses kernel twicing procedure in nonparametric regression to alleviate the low-pass behavior of associated NLM smoothing.
arXiv Detail & Related papers (2025-03-02T01:56:35Z)
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction [11.146015814220858]
FIRST is an algorithm that reduces inference latency by using layer-specific routers to select a subset of transformer layers adaptively for each input sequence. Our approach reveals that input adaptivity is critical - indeed, different task-specific middle layers play a crucial role in evolving hidden representations depending on task.
arXiv Detail & Related papers (2024-10-16T12:45:35Z)
A convolutional neural network approach to deblending seismic data [1.5488464287814563]
We present a data-driven deep learning-based method for fast and efficient seismic deblending. A convolutional neural network (CNN) is designed according to the special character of seismic data. After training and validation of the network, seismic deblending can be performed in near real time.
arXiv Detail & Related papers (2024-09-12T10:54:35Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed. Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Deep Preconditioners and their application to seismic wavefield processing [0.0]
Sparsity-promoting inversion, coupled with fixed-basis sparsifying transforms, represent the go-to approach for many processing tasks. We propose to train an AutoEncoder network to learn a direct mapping between the input seismic data and a representative latent manifold. The trained decoder is subsequently used as a nonlinear preconditioner for the physics-driven inverse problem at hand.
arXiv Detail & Related papers (2022-07-20T14:25:32Z)
StorSeismic: A new paradigm in deep learning for seismic processing [0.0]
StorSeismic is a framework for seismic data processing. We pre-train seismic data, along with synthetically generated ones, in the self-supervised step. Then, we use the labeled synthetic data to fine-tune the pre-trained network in a supervised fashion to perform various seismic processing tasks.
arXiv Detail & Related papers (2022-04-30T09:55:00Z)
Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. A linear-complexity recurrent variant has proven well suited for autoregressive generation. This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z)
Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection. Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling. It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.