xPerT: Extended Persistence Transformer
- URL: http://arxiv.org/abs/2410.14193v1
- Date: Fri, 18 Oct 2024 06:07:22 GMT
- Title: xPerT: Extended Persistence Transformer
- Authors: Sehun Kim,
- Abstract summary: A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales.
We propose a novel transformer architecture called the textitExtended Persistence Transformer (xPerT), which is highly scalable.
xPerT reduces GPU memory usage by over 90% and improves accuracy on multiple datasets.
- Score: 0.0
- License:
- Abstract: A persistence diagram provides a compact summary of persistent homology, which captures the topological features of a space at different scales. However, due to its nature as a set, incorporating it as a feature into a machine learning framework is challenging. Several methods have been proposed to use persistence diagrams as input for machine learning models, but they often require complex preprocessing steps and extensive hyperparameter tuning. In this paper, we propose a novel transformer architecture called the \textit{Extended Persistence Transformer (xPerT)}, which is highly scalable than the compared to Persformer, an existing transformer for persistence diagrams. xPerT reduces GPU memory usage by over 90\% and improves accuracy on multiple datasets. Additionally, xPerT does not require complex preprocessing steps or extensive hyperparameter tuning, making it easy to use in practice. Our code is available at https://github.com/sehunfromdaegu/ECG_JEPA.
Related papers
- TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters [102.1116808722299]
We introduce TokenFormer, a scalable architecture for scaling Transformers.
By treating model parameters as tokens, we replace all the linear projections in Transformers.
Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs.
arXiv Detail & Related papers (2024-10-30T16:19:00Z) - Divide et Impera: Multi-Transformer Architectures for Complex NLP-Tasks [44.99833362998488]
We present an approach in which complex tasks are divided into simpler subtasks.
Multiple transformer models are fine-tuned to one subtask each, and lined up to accomplish the complex task.
This simplifies the compilation of fine-tuning datasets and increases overall controllability.
arXiv Detail & Related papers (2023-10-25T18:00:15Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - Consolidator: Mergeable Adapter with Grouped Connections for Visual
Adaptation [53.835365470800916]
We show how to efficiently and effectively transfer knowledge in a vision transformer.
We propose consolidator to modify the pre-trained model with the addition of a small set of tunable parameters.
Our consolidator can reach up to 7.56 better accuracy than full fine-tuning with merely 0.35% parameters.
arXiv Detail & Related papers (2023-04-30T23:59:02Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - What Dense Graph Do You Need for Self-Attention? [73.82686008622596]
We present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer.
Experiments on tasks requiring various sequence lengths lay validation for our graph function well.
arXiv Detail & Related papers (2022-05-27T14:36:55Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - ENCONTER: Entity Constrained Progressive Sequence Generation via
Insertion-based Transformer [11.310502327308575]
Autoregressive language models do not perform well under hard lexical constraints.
Progressive insertion-based transformers can overcome this limitation.
The paper proposes the Entity-constrained insertion transformer (ENCONTER)
Our experiments show that ENCONTER outperforms other baseline models in several performance metrics.
arXiv Detail & Related papers (2021-03-17T10:24:10Z) - RealFormer: Transformer Likes Residual Attention [5.841046725396454]
RealFormer is a simple Residual Attention Layer Transformer architecture.
It significantly outperforms canonical Transformers on a spectrum of tasks including Masked Language Modeling, GLUE, and SQuAD.
arXiv Detail & Related papers (2020-12-21T23:30:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.