Related papers: Representing Unordered Data Using Complex-Weighted Multiset Automata

Representing Unordered Data Using Complex-Weighted Multiset Automata

URL: http://arxiv.org/abs/2001.00610v3
Date: Fri, 28 Aug 2020 14:11:26 GMT
Title: Representing Unordered Data Using Complex-Weighted Multiset Automata
Authors: Justin DeBenedetto, David Chiang
Abstract summary: We show how the multiset representations of certain existing neural architectures can be viewed as special cases of ours. Namely, we provide a new theoretical and intuitive justification for the Transformer model's representation of positions using sinusoidal functions. We extend the DeepSets model to use complex numbers, enabling it to outperform the existing model on an extension of one of their tasks.
Score: 23.68657135308002
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unordered, variable-sized inputs arise in many settings across multiple fields. The ability for set- and multiset-oriented neural networks to handle this type of input has been the focus of much work in recent years. We propose to represent multisets using complex-weighted multiset automata and show how the multiset representations of certain existing neural architectures can be viewed as special cases of ours. Namely, (1) we provide a new theoretical and intuitive justification for the Transformer model's representation of positions using sinusoidal functions, and (2) we extend the DeepSets model to use complex numbers, enabling it to outperform the existing model on an extension of one of their tasks.

Related papers

IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification [60.38841251693781]
We propose a novel framework to generate robust multi-modal object ReIDs. Our framework uses Modal Prefixes and InverseNet to integrate multi-modal information with semantic guidance from inverted text. Experiments on three multi-modal object ReID benchmarks demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2025-03-13T13:00:31Z)
SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection [73.49799596304418]
This paper introduces a new task called Multi-Modal datasets and Multi-Task Object Detection (M2Det) for remote sensing. It is designed to accurately detect horizontal or oriented objects from any sensor modality. This task poses challenges due to 1) the trade-offs involved in managing multi-modal modelling and 2) the complexities of multi-task optimization.
arXiv Detail & Related papers (2024-12-30T02:47:51Z)
MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt [60.10555128510744]
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal object ReID tasks. We introduce a novel framework called MambaPro for multi-modal object ReID.
arXiv Detail & Related papers (2024-12-14T06:33:53Z)
Multiset Transformer: Advancing Representation Learning in Persistence Diagrams [11.512742322405906]
Multiset Transformer is a neural network that utilizes attention mechanisms specifically designed for multisets as inputs. The architecture integrates multiset-enhanced attentions with a pool-decomposition scheme, allowing multiplicities to be preserved across equivariant layers. Experimental results demonstrate that the Multiset Transformer outperforms existing neural network methods in the realm of persistence diagram representation learning.
arXiv Detail & Related papers (2024-11-22T01:38:47Z)
U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics. We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features. Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z)
EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention [88.45459681677369]
We propose a novel transformer variant with complex vector attention, named EulerFormer. It provides a unified theoretical framework to formulate both semantic difference and positional difference. It is more robust to semantic variations and possesses moresuperior theoretical properties in principle.
arXiv Detail & Related papers (2024-03-26T14:18:43Z)
Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences. Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z)
Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter. We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another. Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z)
Modular Blended Attention Network for Video Question Answering [1.131316248570352]
We present an approach to facilitate the question with a reusable and composable neural unit. We have conducted experiments on three commonly used datasets.
arXiv Detail & Related papers (2023-11-02T14:22:17Z)
MulT: An End-to-End Multitask Learning Transformer [66.52419626048115]
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks. Our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads.
arXiv Detail & Related papers (2022-05-17T13:03:18Z)
Unsupervised Multimodal Language Representations using Convolutional Autoencoders [5.464072883537924]
We propose extracting unsupervised Multimodal Language representations that are universal and can be applied to different tasks. We map the word-level aligned multimodal sequences to 2-D matrices and then use Convolutional Autoencoders to learn embeddings by combining multiple datasets. It is also shown that our method is extremely lightweight and can be easily generalized to other tasks and unseen data with small performance drop and almost the same number of parameters.
arXiv Detail & Related papers (2021-10-06T18:28:07Z)
Abelian Neural Networks [48.52497085313911]
We first construct a neural network architecture for Abelian group operations and derive a universal approximation property. We extend it to Abelian semigroup operations using the characterization of associative symmetrics. We train our models over fixed word embeddings and demonstrate improved performance over the original word2vec.
arXiv Detail & Related papers (2021-02-24T11:52:21Z)
DynE: Dynamic Ensemble Decoding for Multi-Document Summarization [5.197307534263253]
We propose a simple decoding methodology which ensembles the output of multiple instances of the same model on different inputs. We obtain state-of-the-art results on several multi-document summarization datasets.
arXiv Detail & Related papers (2020-06-15T20:40:06Z)
Deep Multi-Modal Sets [29.983311598563542]
Deep Multi-Modal Sets is a technique that represents a collection of features as an unordered set rather than one long ever-growing fixed-size vector. We demonstrate a scalable, multi-modal framework that reasons over different modalities to learn various types of tasks.
arXiv Detail & Related papers (2020-03-03T15:48:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.