Residual Attention Net for Superior Cross-Domain Time Sequence Modeling
- URL: http://arxiv.org/abs/2001.04077v1
- Date: Mon, 13 Jan 2020 06:14:04 GMT
- Title: Residual Attention Net for Superior Cross-Domain Time Sequence Modeling
- Authors: Seth H. Huang, Xu Lingjie, Jiang Congwei
- Abstract summary: This paper serves as a proof-of-concept for a new architecture, with RAN aiming at providing the model a higher level understanding of sequence patterns.
We have achieved 35 state-of-the-art results with 10 results matching current state-of-the-art results without further model fine-tuning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel architecture, residual attention net (RAN), which merges a
sequence architecture, universal transformer, and a computer vision
architecture, residual net, with a high-way architecture for cross-domain
sequence modeling. The architecture aims at addressing the long dependency
issue often faced by recurrent-neural-net-based structures. This paper serves
as a proof-of-concept for a new architecture, with RAN aiming at providing the
model a higher level understanding of sequence patterns. To our best knowledge,
we are the first to propose such an architecture. Out of the standard 85 UCR
data sets, we have achieved 35 state-of-the-art results with 10 results
matching current state-of-the-art results without further model fine-tuning.
The results indicate that such architecture is promising in complex,
long-sequence modeling and may have vast, cross-domain applications.
Related papers
- Multi-conditioned Graph Diffusion for Neural Architecture Search [8.290336491323796]
We present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures.
We show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed.
arXiv Detail & Related papers (2024-03-09T21:45:31Z) - Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling [4.190836962132713]
This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms.
At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its conditioned kernel on input sequence.
We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality.
arXiv Detail & Related papers (2024-02-28T17:36:45Z) - The autoregressive neural network architecture of the Boltzmann distribution of pairwise interacting spins systems [0.0]
Generative Autoregressive Neural Networks (ARNNs) have recently demonstrated exceptional results in image and language generation tasks.
This work presents an exact mapping of the Boltzmann distribution of binary pairwise interacting systems into autoregressive form.
The resulting ARNN architecture has weights and biases of its first layer corresponding to the Hamiltonian's couplings and external fields.
arXiv Detail & Related papers (2023-02-16T15:05:37Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Rethinking Architecture Selection in Differentiable NAS [74.61723678821049]
Differentiable Neural Architecture Search is one of the most popular NAS methods for its search efficiency and simplicity.
We propose an alternative perturbation-based architecture selection that directly measures each operation's influence on the supernet.
We find that several failure modes of DARTS can be greatly alleviated with the proposed selection method.
arXiv Detail & Related papers (2021-08-10T00:53:39Z) - Operation Embeddings for Neural Architecture Search [15.033712726016255]
We propose the replacement of fixed operator encoding with learnable representations in the optimization process.
Our method produces top-performing architectures that share similar operation and graph patterns.
arXiv Detail & Related papers (2021-05-11T09:17:10Z) - Inter-layer Transition in Neural Architecture Search [89.00449751022771]
The dependency between the architecture weights of connected edges is explicitly modeled in this paper.
Experiments on five benchmarks confirm the value of modeling inter-layer dependency and demonstrate the proposed method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T03:33:52Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - Hierarchical Conditional Relation Networks for Video Question Answering [62.1146543269993]
We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN)
CRN serves as a building block to construct more sophisticated structures for representation and reasoning over video.
Our evaluations on well-known datasets achieved new SoTA results, demonstrating the impact of building a general-purpose reasoning unit on complex domains such as VideoQA.
arXiv Detail & Related papers (2020-02-25T07:00:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.