The Backpropagation of the Wave Network
- URL: http://arxiv.org/abs/2411.06989v2
- Date: Sat, 11 Jan 2025 03:06:07 GMT
- Title: The Backpropagation of the Wave Network
- Authors: Xin Zhang, Victor S. Sheng,
- Abstract summary: This paper provides an in-depth analysis of Wave Network, a novel token representation method derived from the Wave Network.<n>In complex vector token representation, each token is represented with a magnitude component, capturing the global semantics of the entire input text.<n>A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time.
- Score: 26.656105779121308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper provides an in-depth analysis of Wave Network, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors. In complex vector token representation, each token is represented with a magnitude component, capturing the global semantics of the entire input text, and a phase component, encoding the relationships between individual tokens and the global semantics. Building on prior research that demonstrated the effectiveness of wave-like operations, such as interference and modulation, during forward propagation, this study investigates the convergence behavior, backpropagation characteristics, and embedding independence within the Token2Wave framework. A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time compared to BERT. Gradient comparisons for the [CLS] token, total input text, and classifier parameters further highlight Token2Wave's unique characteristics. This research offers new insights into wave-based token representations, demonstrating their potential to enable efficient and computationally friendly language model architectures.
Related papers
- "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.
Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - Single-Channel EEG Tokenization Through Time-Frequency Modeling [27.109527161042813]
We introduce TFM-Tokenizer, a novel tokenization framework tailored for EEG analysis.
By learning tokens that encapsulate intrinsic patterns within a single channel, our approach yields a scalable tokenizer across diverse EEG settings.
arXiv Detail & Related papers (2025-02-22T03:32:36Z) - SigWavNet: Learning Multiresolution Signal Wavelet Network for Speech Emotion Recognition [17.568724398229232]
Speech emotion recognition (SER) plays an important role in emotional states from deciphering speech signals.
This paper introduces a new end-to-end (E2E) deep learning multi-resolution framework for SER.
It exploits the capabilities of wavelets for effective localization in both time and frequency domains.
arXiv Detail & Related papers (2025-02-01T04:18:06Z) - Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization [74.3339999119713]
We develop a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies.
Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon.
arXiv Detail & Related papers (2024-12-06T18:22:59Z) - Wave Network: An Ultra-Small Language Model [26.656105779121308]
We propose an innovative token representation and update method in a new ultra-small language model: the Wave network.
Specifically, we use a complex vector to represent each token, encoding both global and local semantics of the input text.
Experiments on the AG News text classification task demonstrate that, when generating complex vectors from randomly token embeddings, our single-layer Wave Network achieves 90.91% accuracy with wave interference and 91.66% with wave modulation.
arXiv Detail & Related papers (2024-11-04T23:21:12Z) - CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens [49.569695524535454]
We propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder.
Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis.
arXiv Detail & Related papers (2024-07-07T15:16:19Z) - Toward end-to-end interpretable convolutional neural networks for waveform signals [0.7499722271664147]
This paper introduces a novel convolutional neural networks (CNN) framework tailored for end-to-end audio deep learning models.
By benchmarking experiments on three standard speech emotion recognition datasets with five-fold cross-validation, our framework outperforms Mel spectrogram features by up to seven percent.
arXiv Detail & Related papers (2024-05-03T02:24:27Z) - Exploring the Role of Token in Transformer-based Time Series Forecasting [10.081240480138487]
Transformer-based methods are a mainstream approach for solving time series forecasting (TSF)
Most focus on optimizing the model structure, with few studies paying attention to the role of tokens for predictions.
We find that the gradients mainly depend on tokens that contribute to the predicted series, called positive tokens.
To utilize T-PE and V-PE, we propose T2B-PE, a Transformer-based dual-branch framework.
arXiv Detail & Related papers (2024-04-16T07:21:39Z) - Subobject-level Image Tokenization [60.80949852899857]
Patch-based image tokenization ignores the morphology of the visual world.
Inspired by subword tokenization, we introduce subobject-level adaptive token segmentation.
We show that subobject tokenization enables faster convergence and better generalization while using fewer visual tokens.
arXiv Detail & Related papers (2024-02-22T06:47:44Z) - Object Recognition as Next Token Prediction [99.40793702627396]
We present an approach to pose object recognition as next token prediction.
The idea is to apply a language decoder that auto-regressively predicts the text tokens from image embeddings to form labels.
arXiv Detail & Related papers (2023-12-04T18:58:40Z) - A Theoretical Understanding of Shallow Vision Transformers: Learning,
Generalization, and Sample Complexity [71.11795737362459]
ViTs with self-attention modules have recently achieved great empirical success in many tasks.
However, theoretical learning generalization analysis is mostly noisy and elusive.
This paper provides the first theoretical analysis of a shallow ViT for a classification task.
arXiv Detail & Related papers (2023-02-12T22:12:35Z) - UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units [64.61596752343837]
We present a novel two-pass direct S2ST architecture, UnitY, which first generates textual representations and predicts discrete acoustic units.
We enhance the model performance by subword prediction in the first-pass decoder.
We show that the proposed methods boost the performance even when predicting spectrogram in the second pass.
arXiv Detail & Related papers (2022-12-15T18:58:28Z) - Opening the Black Box of wav2vec Feature Encoder [2.1219431687928525]
We focus on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units.
To analyze the embedding space in a reductive manner, we feed the synthesized audio signals, which is the summation of simple sine waves.
We conclude that various information is embedded inside the feature encoder representations: (1) fundamental frequency, (2) formants, and (3) amplitude, packed with (4) sufficient temporal detail.
arXiv Detail & Related papers (2022-10-27T12:47:35Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z) - Neural BRDF Representation and Importance Sampling [79.84316447473873]
We present a compact neural network-based representation of reflectance BRDF data.
We encode BRDFs as lightweight networks, and propose a training scheme with adaptive angular sampling.
We evaluate encoding results on isotropic and anisotropic BRDFs from multiple real-world datasets.
arXiv Detail & Related papers (2021-02-11T12:00:24Z) - Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis [25.234945748885348]
We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs.
The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop.
Experiments show that the proposed model generates speech with quality approaching a state-of-the-art neural TTS system.
arXiv Detail & Related papers (2020-11-06T19:30:07Z) - Understanding Neural Abstractive Summarization Models via Uncertainty [54.37665950633147]
seq2seq abstractive summarization models generate text in a free-form manner.
We study the entropy, or uncertainty, of the model's token-level predictions.
We show that uncertainty is a useful perspective for analyzing summarization and text generation models more broadly.
arXiv Detail & Related papers (2020-10-15T16:57:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.