Refined Gate: A Simple and Effective Gating Mechanism for Recurrent
Units
- URL: http://arxiv.org/abs/2002.11338v2
- Date: Tue, 26 May 2020 13:59:48 GMT
- Title: Refined Gate: A Simple and Effective Gating Mechanism for Recurrent
Units
- Authors: Zhanzhan Cheng, Yunlu Xu, Mingjian Cheng, Yu Qiao, Shiliang Pu, Yi Niu
and Fei Wu
- Abstract summary: We propose a new gating mechanism within general gated recurrent neural networks to handle this issue.
The proposed gates directly short connect the extracted input features to the outputs of vanilla gates.
We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
- Score: 68.30422112784355
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recurrent neural network (RNN) has been widely studied in sequence learning
tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating
mechanism (in control of how information flows between hidden states). However,
the vanilla gates in RNN (e.g., the input gate in LSTM) suffer from the problem
of gate undertraining, which can be caused by various factors, such as the
saturating activation functions, the gate layouts (e.g., the gate number and
gating functions), or even the suboptimal memory state etc.. Those may result
in failures of learning gating switch roles and thus the weak performance. In
this paper, we propose a new gating mechanism within general gated recurrent
neural networks to handle this issue. Specifically, the proposed gates directly
short connect the extracted input features to the outputs of vanilla gates,
denoted as refined gates. The refining mechanism allows enhancing gradient
back-propagation as well as extending the gating activation scope, which can
guide RNN to reach possibly deeper minima. We verify the proposed gating
mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
Extensive experiments on 3 synthetic tasks, 3 language modeling tasks and 5
scene text recognition benchmarks demonstrate the effectiveness of our method.
Related papers
- DeepGate3: Towards Scalable Circuit Representation Learning [9.910071321534682]
Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA)
Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings.
We introduce DeepGate3, an enhanced architecture that integrates Transformer modules following the initial GNN processing.
arXiv Detail & Related papers (2024-07-15T02:44:21Z) - NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks.
We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT.
We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z) - Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks.
Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs.
We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Hybrid Graph Neural Networks for Few-Shot Learning [85.93495480949079]
Graph neural networks (GNNs) have been used to tackle the few-shot learning problem.
Under the inductive setting, existing GNN based methods are less competitive.
We propose a novel hybrid GNN model consisting of two GNNs, an instance GNN and a prototype GNN.
arXiv Detail & Related papers (2021-12-13T10:20:15Z) - GDP: Stabilized Neural Network Pruning via Gates with Differentiable
Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest.
GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel.
Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z) - Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks.
Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z) - Gates Are Not What You Need in RNNs [2.6199029802346754]
We propose a new recurrent cell called Residual Recurrent Unit (RRU) which beats traditional cells and does not employ a single gate.
It is based on the residual shortcut connection, linear transformations, ReLU, and normalization.
Our experiments show that RRU outperforms the traditional gated units on most of these tasks.
arXiv Detail & Related papers (2021-08-01T19:20:34Z) - GateNet: Gating-Enhanced Deep Network for Click-Through Rate Prediction [3.201333208812837]
In recent years, many neural network based CTR models have been proposed and achieved success.
We propose a novel model named GateNet which introduces either the feature embedding gate or the hidden gate to the embedding layer or hidden CTR models.
arXiv Detail & Related papers (2020-07-06T12:45:46Z) - Gating creates slow modes and controls phase-space complexity in GRUs
and LSTMs [5.672132510411465]
We study how the addition of gates influences the dynamics and trainability of GRUs and LSTMs.
We show that the update gate in the GRU and the forget gate in the LSTM can lead to an accumulation of slow modes in the dynamics.
arXiv Detail & Related papers (2020-01-31T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.