General Transform: A Unified Framework for Adaptive Transform to Enhance Representations
- URL: http://arxiv.org/abs/2505.04969v1
- Date: Thu, 08 May 2025 06:01:11 GMT
- Title: General Transform: A Unified Framework for Adaptive Transform to Enhance Representations
- Authors: Gekko Budiutama, Shunsuke Daimon, Hirofumi Nishi, Yu-ichiro Matsushita,
- Abstract summary: General Transform (GT) is an adaptive transform-based representation designed for machine learning applications.<n>GT learns data-driven mapping tailored to the dataset and task of interest.<n>Models incorporating GT outperform conventional transform-based approaches across computer vision and natural language processing tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete transforms, such as the discrete Fourier transform, are widely used in machine learning to improve model performance by extracting meaningful features. However, with numerous transforms available, selecting an appropriate one often depends on understanding the dataset's properties, making the approach less effective when such knowledge is unavailable. In this work, we propose General Transform (GT), an adaptive transform-based representation designed for machine learning applications. Unlike conventional transforms, GT learns data-driven mapping tailored to the dataset and task of interest. Here, we demonstrate that models incorporating GT outperform conventional transform-based approaches across computer vision and natural language processing tasks, highlighting its effectiveness in diverse learning scenarios.
Related papers
- Self-supervised Transformation Learning for Equivariant Representations [26.207358743969277]
Unsupervised representation learning has significantly advanced various machine learning tasks.<n>We propose Self-supervised Transformation Learning (STL), replacing transformation labels with transformation representations derived from image pairs.<n>We demonstrate the approach's effectiveness across diverse classification and detection tasks, outperforming existing methods in 7 out of 11 benchmarks.
arXiv Detail & Related papers (2025-01-15T10:54:21Z) - Survey: Transformer-based Models in Data Modality Conversion [0.8136541584281987]
Modality Conversion involves the transformation of data from one form of representation to another, mimicking the way humans integrate and interpret sensory information.
This paper provides a comprehensive review of transformer-based models applied to the primary modalities of text, vision, and speech, discussing their architectures, conversion methodologies, and applications.
arXiv Detail & Related papers (2024-08-08T18:39:14Z) - Learning Explicitly Conditioned Sparsifying Transforms [7.335712499936904]
We consider a new sparsifying transform model that enforces explicit control over the data representation quality and the condition number of the learned transforms.
We confirm through numerical experiments that our model presents better numerical behavior than the state-of-the-art.
arXiv Detail & Related papers (2024-03-05T18:03:51Z) - ParGAN: Learning Real Parametrizable Transformations [50.51405390150066]
We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations.
The proposed generator takes as input both an image and a parametrization of the transformation.
We show how, with disjoint image domains with no annotated parametrization, our framework can create smooths as well as learn multiple transformations simultaneously.
arXiv Detail & Related papers (2022-11-09T16:16:06Z) - Towards Lightweight Transformer via Group-wise Transformation for
Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer.
We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets.
Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z) - A Generic Self-Supervised Framework of Learning Invariant Discriminative
Features [9.614694312155798]
This paper proposes a generic SSL framework based on a constrained self-labelling assignment process.
The proposed training strategy outperforms a majority of state-of-the-art representation learning methods based on AE structures.
arXiv Detail & Related papers (2022-02-14T18:09:43Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z) - On Compositions of Transformations in Contrastive Self-Supervised
Learning [66.15514035861048]
In this paper, we generalize contrastive learning to a wider set of transformations.
We find that being invariant to certain transformations and distinctive to others is critical to learning effective video representations.
arXiv Detail & Related papers (2020-03-09T17:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.