Representing Unordered Data Using Complex-Weighted Multiset Automata
- URL: http://arxiv.org/abs/2001.00610v3
- Date: Fri, 28 Aug 2020 14:11:26 GMT
- Title: Representing Unordered Data Using Complex-Weighted Multiset Automata
- Authors: Justin DeBenedetto, David Chiang
- Abstract summary: We show how the multiset representations of certain existing neural architectures can be viewed as special cases of ours.
Namely, we provide a new theoretical and intuitive justification for the Transformer model's representation of positions using sinusoidal functions.
We extend the DeepSets model to use complex numbers, enabling it to outperform the existing model on an extension of one of their tasks.
- Score: 23.68657135308002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unordered, variable-sized inputs arise in many settings across multiple
fields. The ability for set- and multiset-oriented neural networks to handle
this type of input has been the focus of much work in recent years. We propose
to represent multisets using complex-weighted multiset automata and show how
the multiset representations of certain existing neural architectures can be
viewed as special cases of ours. Namely, (1) we provide a new theoretical and
intuitive justification for the Transformer model's representation of positions
using sinusoidal functions, and (2) we extend the DeepSets model to use complex
numbers, enabling it to outperform the existing model on an extension of one of
their tasks.
Related papers
- Multiset Transformer: Advancing Representation Learning in Persistence Diagrams [11.512742322405906]
Multiset Transformer is a neural network that utilizes attention mechanisms specifically designed for multisets as inputs.
The architecture integrates multiset-enhanced attentions with a pool-decomposition scheme, allowing multiplicities to be preserved across equivariant layers.
Experimental results demonstrate that the Multiset Transformer outperforms existing neural network methods in the realm of persistence diagram representation learning.
arXiv Detail & Related papers (2024-11-22T01:38:47Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention [88.45459681677369]
We propose a novel transformer variant with complex vector attention, named EulerFormer.
It provides a unified theoretical framework to formulate both semantic difference and positional difference.
It is more robust to semantic variations and possesses moresuperior theoretical properties in principle.
arXiv Detail & Related papers (2024-03-26T14:18:43Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Modular Blended Attention Network for Video Question Answering [1.131316248570352]
We present an approach to facilitate the question with a reusable and composable neural unit.
We have conducted experiments on three commonly used datasets.
arXiv Detail & Related papers (2023-11-02T14:22:17Z) - MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks.
Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z) - Unsupervised Multimodal Language Representations using Convolutional
Autoencoders [5.464072883537924]
We propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks.
We map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets.
It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters.
arXiv Detail & Related papers (2021-10-06T18:28:07Z) - Abelian Neural Networks [48.52497085313911]
We first construct a neural network architecture for Abelian group operations and derive a universal approximation property.
We extend it to Abelian semigroup operations using the characterization of associative symmetrics.
We train our models over fixed word embeddings and demonstrate improved performance over the original word2vec.
arXiv Detail & Related papers (2021-02-24T11:52:21Z) - DynE: Dynamic Ensemble Decoding for Multi-Document Summarization [5.197307534263253]
We propose a simple decoding methodology which ensembles the output of multiple instances of the same model on different inputs.
We obtain state-of-the-art results on several multi-document summarization datasets.
arXiv Detail & Related papers (2020-06-15T20:40:06Z) - Deep Multi-Modal Sets [29.983311598563542]
Deep Multi-Modal Sets is a technique that represents a collection of features as an unordered set rather than one long ever-growing fixed-size vector.
We demonstrate a scalable, multi-modal framework that reasons over different modalities to learn various types of tasks.
arXiv Detail & Related papers (2020-03-03T15:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.