Related papers: Rethinking skip connection model as a learnable Markov chain

Rethinking skip connection model as a learnable Markov chain

URL: http://arxiv.org/abs/2209.15278v1
Date: Fri, 30 Sep 2022 07:31:49 GMT
Title: Rethinking skip connection model as a learnable Markov chain
Authors: Dengsheng Chen, Jie Hu, Wenwen Qiang, Xiaoming Wei, Enhua Wu
Abstract summary: We deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain. An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way. We propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain.
Score: 12.135167279383815
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over past few years afterward the birth of ResNet, skip connection has become the defacto standard for the design of modern architectures due to its widespread adoption, easy optimization and proven performance. Prior work has explained the effectiveness of the skip connection mechanism from different perspectives. In this work, we deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain. An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way. However, while a model is explained as a Markov chain, it is not guaranteed to be optimized following an efficient Markov chain by existing SGD-based optimizers which are prone to get trapped in local optimal points. In order to towards a more efficient Markov chain, we propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain. Aside from that, the penal connection can also be viewed as a particular model regularization and can be easily implemented with one line of code in the most popular deep learning frameworks~\footnote{Source code: \url{https://github.com/densechen/penal-connection}}. The encouraging experimental results in multi-modal translation and image recognition empirically confirm our conjecture of the learnable Markov chain view and demonstrate the superiority of the proposed penal connection.

Related papers

Chain-of-Model Learning for Language Model [91.81240728426994]
We propose a novel learning paradigm, termed Chain-of-Model (CoM), which incorporates the causal relationship into the hidden states of each layer as a chain style.<n>We introduce the concept of Chain-of-Representation (CoR), which formulates the hidden states at each layer as a combination of multiple sub-representations (i.e., chains) at the hidden dimension level.<n>Based on CoLM, we further introduce CoLM-Air by introducing a KV sharing mechanism, that computes all keys and values within the first chain and then shares across all chains.
arXiv Detail & Related papers (2025-05-17T04:06:12Z)
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation [1.878433493707693]
Recent progress in interactive point prompt based Image allows to significantly reduce the manual effort to obtain high quality semantic labels. We propose a novel unsupervised and training-free approach based solely on the self-attention of Stable Diffusion.
arXiv Detail & Related papers (2024-11-15T18:29:59Z)
CONVERT:Contrastive Graph Clustering with Reliable Augmentation [110.46658439733106]
We propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (CONVERT) In our method, the data augmentations are processed by the proposed reversible perturb-recover network. To further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network.
arXiv Detail & Related papers (2023-08-17T13:07:09Z)
Generative Flow Networks: a Markov Chain Perspective [93.9910025411313]
We propose a new perspective for GFlowNets using Markov chains, showing a unifying view for GFlowNets regardless of the nature of the state space. Positioning GFlowNets under the same theoretical framework as MCMC methods also allows us to identify the similarities between both frameworks.
arXiv Detail & Related papers (2023-07-04T01:28:02Z)
Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z)
Learning Mixtures of Markov Chains with Quality Guarantees [8.528384027684192]
A large number of modern applications generate a plethora of user trails. One approach to modeling this problem mathematically is as a mixture of Markov chains. Recently, Gupta, Kumar and Vassilvitski [GKV16] introduced an algorithm that can perfectly recover a mixture of L chains on n states.
arXiv Detail & Related papers (2023-02-09T14:55:17Z)
Contrastive Self-supervised Sequential Recommendation with Robust Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data. Old and new issues remain, including data-sparsity and noisy data. We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z)
BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition [57.2201011783393]
conditional hidden Markov model (CHMM) CHMM predicts token-wise transition and emission probabilities from the BERT embeddings of the input tokens. It fine-tunes a BERT-based NER model with the labels inferred by CHMM.
arXiv Detail & Related papers (2021-05-26T21:18:48Z)
ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations. Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z)
Semi-supervised Learning Meets Factorization: Learning to Recommend with Chain Graph Model [16.007141894770054]
latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. Semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem. We propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM.
arXiv Detail & Related papers (2020-03-05T06:34:53Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.