Optimizing a Transformer-based network for a deep learning seismic
processing workflow
- URL: http://arxiv.org/abs/2308.04739v1
- Date: Wed, 9 Aug 2023 07:11:42 GMT
- Title: Optimizing a Transformer-based network for a deep learning seismic
processing workflow
- Authors: Randy Harsuko and Tariq Alkhalifah
- Abstract summary: StorSeismic is a recently introduced model based on the Transformer to adapt to various seismic processing tasks.
We observe faster pretraining and competitive results on the fine-tuning tasks and, additionally, fewer parameters to train compared to the vanilla model.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: StorSeismic is a recently introduced model based on the Transformer to adapt
to various seismic processing tasks through its pretraining and fine-tuning
training strategy. In the original implementation, StorSeismic utilized a
sinusoidal positional encoding and a conventional self-attention mechanism,
both borrowed from the natural language processing (NLP) applications. For
seismic processing they admitted good results, but also hinted to limitations
in efficiency and expressiveness. We propose modifications to these two key
components, by utilizing relative positional encoding and low-rank attention
matrices as replacements to the vanilla ones. The proposed changes are tested
on processing tasks applied to a realistic Marmousi and offshore field data as
a sequential strategy, starting from denoising, direct arrival removal,
multiple attenuation, and finally root-mean-squared velocity ($V_{RMS}$)
prediction for normal moveout (NMO) correction. We observe faster pretraining
and competitive results on the fine-tuning tasks and, additionally, fewer
parameters to train compared to the vanilla model.
Related papers
- In-Context Learning for MIMO Equalization Using Transformer-Based
Sequence Models [44.161789477821536]
Large pre-trained sequence models have the capacity to carry out in-context learning (ICL)
In ICL, a decision on a new input is made via a direct mapping of the input and of a few examples from the given task.
We demonstrate via numerical results that transformer-based ICL has a threshold behavior.
arXiv Detail & Related papers (2023-11-10T15:09:04Z) - Partial Tensorized Transformers for Natural Language Processing [0.0]
We study the effect of tensor-train decomposition to improve the accuracy and compress vision-language neural networks, namely BERT and ViT.
Our novel PTNN approach significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments.
arXiv Detail & Related papers (2023-10-30T23:19:06Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Deep Preconditioners and their application to seismic wavefield
processing [0.0]
Sparsity-promoting inversion, coupled with fixed-basis sparsifying transforms, represent the go-to approach for many processing tasks.
We propose to train an AutoEncoder network to learn a direct mapping between the input seismic data and a representative latent manifold.
The trained decoder is subsequently used as a nonlinear preconditioner for the physics-driven inverse problem at hand.
arXiv Detail & Related papers (2022-07-20T14:25:32Z) - StorSeismic: A new paradigm in deep learning for seismic processing [0.0]
StorSeismic is a framework for seismic data processing.
We pre-train seismic data, along with synthetically generated ones, in the self-supervised step.
Then, we use the labeled synthetic data to fine-tune the pre-trained network in a supervised fashion to perform various seismic processing tasks.
arXiv Detail & Related papers (2022-04-30T09:55:00Z) - Finetuning Pretrained Transformers into RNNs [81.72974646901136]
Transformers have outperformed recurrent neural networks (RNNs) in natural language generation.
A linear-complexity recurrent variant has proven well suited for autoregressive generation.
This work aims to convert a pretrained transformer into its efficient recurrent counterpart.
arXiv Detail & Related papers (2021-03-24T10:50:43Z) - Training Transformers for Information Security Tasks: A Case Study on
Malicious URL Prediction [3.660098145214466]
We implement a malicious/benign predictor URL based on a transformer architecture that is trained from scratch.
We show that in contrast to conventional natural language processing (NLP) transformers, this model requires a different training approach to work well.
arXiv Detail & Related papers (2020-11-05T18:58:51Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.