Rethinking skip connection model as a learnable Markov chain
- URL: http://arxiv.org/abs/2209.15278v1
- Date: Fri, 30 Sep 2022 07:31:49 GMT
- Title: Rethinking skip connection model as a learnable Markov chain
- Authors: Dengsheng Chen, Jie Hu, Wenwen Qiang, Xiaoming Wei, Enhua Wu
- Abstract summary: We deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain.
An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way.
We propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain.
- Score: 12.135167279383815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over past few years afterward the birth of ResNet, skip connection has become
the defacto standard for the design of modern architectures due to its
widespread adoption, easy optimization and proven performance. Prior work has
explained the effectiveness of the skip connection mechanism from different
perspectives. In this work, we deep dive into the model's behaviors with skip
connections which can be formulated as a learnable Markov chain. An efficient
Markov chain is preferred as it always maps the input data to the target domain
in a better way. However, while a model is explained as a Markov chain, it is
not guaranteed to be optimized following an efficient Markov chain by existing
SGD-based optimizers which are prone to get trapped in local optimal points. In
order to towards a more efficient Markov chain, we propose a simple routine of
penal connection to make any residual-like model become a learnable Markov
chain. Aside from that, the penal connection can also be viewed as a particular
model regularization and can be easily implemented with one line of code in the
most popular deep learning frameworks~\footnote{Source code:
\url{https://github.com/densechen/penal-connection}}. The encouraging
experimental results in multi-modal translation and image recognition
empirically confirm our conjecture of the learnable Markov chain view and
demonstrate the superiority of the proposed penal connection.
Related papers
- CONVERT:Contrastive Graph Clustering with Reliable Augmentation [110.46658439733106]
We propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (CONVERT)
In our method, the data augmentations are processed by the proposed reversible perturb-recover network.
To further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network.
arXiv Detail & Related papers (2023-08-17T13:07:09Z) - Generative Flow Networks: a Markov Chain Perspective [93.9910025411313]
We propose a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space.
Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks.
arXiv Detail & Related papers (2023-07-04T01:28:02Z) - Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme.
We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z) - Learning Mixtures of Markov Chains with Quality Guarantees [8.528384027684192]
A large number of modern applications generate a plethora of user trails.
One approach to modeling this problem mathematically is as a mixture of Markov chains.
Recently, Gupta, Kumar and Vassilvitski [GKV16] introduced an algorithm that can perfectly recover a mixture of L chains on n states.
arXiv Detail & Related papers (2023-02-09T14:55:17Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised
Named Entity Recognition [57.2201011783393]
conditional hidden Markov model (CHMM)
CHMM predicts token-wise transition and emission probabilities from the BERT embeddings of the input tokens.
It fine-tunes a BERT-based NER model with the labels inferred by CHMM.
arXiv Detail & Related papers (2021-05-26T21:18:48Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z) - Semi-supervised Learning Meets Factorization: Learning to Recommend with
Chain Graph Model [16.007141894770054]
latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability.
Semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem.
We propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM.
arXiv Detail & Related papers (2020-03-05T06:34:53Z) - Learning Scalable Multi-Agent Coordination by Spatial Differentiation
for Traffic Signal Control [8.380832628205372]
We design a multiagent coordination framework based on Deep Reinforcement Learning methods for traffic signal control.
Specifically, we propose the Spatial Differentiation method for coordination which uses the temporal-spatial information in the replay buffer to amend the reward of each action.
arXiv Detail & Related papers (2020-02-27T02:16:00Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.