Related papers: Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units

URL: http://arxiv.org/abs/2002.11338v2
Date: Tue, 26 May 2020 13:59:48 GMT
Title: Refined Gate: A Simple and Effective Gating Mechanism for Recurrent Units
Authors: Zhanzhan Cheng, Yunlu Xu, Mingjian Cheng, Yu Qiao, Shiliang Pu, Yi Niu and Fei Wu
Abstract summary: We propose a new gating mechanism within general gated recurrent neural networks to handle this issue. The proposed gates directly short connect the extracted input features to the outputs of vanilla gates. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
Score: 68.30422112784355
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Recurrent neural network (RNN) has been widely studied in sequence learning tasks, while the mainstream models (e.g., LSTM and GRU) rely on the gating mechanism (in control of how information flows between hidden states). However, the vanilla gates in RNN (e.g., the input gate in LSTM) suffer from the problem of gate undertraining, which can be caused by various factors, such as the saturating activation functions, the gate layouts (e.g., the gate number and gating functions), or even the suboptimal memory state etc.. Those may result in failures of learning gating switch roles and thus the weak performance. In this paper, we propose a new gating mechanism within general gated recurrent neural networks to handle this issue. Specifically, the proposed gates directly short connect the extracted input features to the outputs of vanilla gates, denoted as refined gates. The refining mechanism allows enhancing gradient back-propagation as well as extending the gating activation scope, which can guide RNN to reach possibly deeper minima. We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU. Extensive experiments on 3 synthetic tasks, 3 language modeling tasks and 5 scene text recognition benchmarks demonstrate the effectiveness of our method.

Related papers

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs. Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Circuit Representation Learning with Masked Gate Modeling and Verilog-AIG Alignment [4.383547986717473]
We introduce MGVGA, a novel constrained masked modeling paradigm incorporating masked gate modeling (MGM) and Verilog-AIG alignment (VGA) MGM preserves logical equivalence by masking gates in the latent space rather than in the original circuits, subsequently reconstructing the attributes of these masked gates. Building upon this capability, VGA performs masking operations on original circuits and reconstructs masked gates under the constraints of equivalent Verilog codes.
arXiv Detail & Related papers (2025-02-18T10:48:16Z)
DeepGate3: Towards Scalable Circuit Representation Learning [9.910071321534682]
Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA) Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings. We introduce DeepGate3, an enhanced architecture that integrates Transformer modules following the initial GNN processing.
arXiv Detail & Related papers (2024-07-15T02:44:21Z)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions [2.7086888205833968]
Field-Programmable Gate Array (FPGA) accelerators have proven successful in handling latency- and resource-critical deep neural network (DNN) inference tasks. We propose relaxing the boundaries of neurons and mapping entire sub-networks to a single LUT. We validate our proposed method on a known latency-critical task, jet substructure tagging, and on the classical computer vision task, digit classification using MNIST.
arXiv Detail & Related papers (2024-02-29T16:10:21Z)
Securing Graph Neural Networks in MLaaS: A Comprehensive Realization of Query-based Integrity Verification [68.86863899919358]
We introduce a groundbreaking approach to protect GNN models in Machine Learning from model-centric attacks. Our approach includes a comprehensive verification schema for GNN's integrity, taking into account both transductive and inductive GNNs. We propose a query-based verification technique, fortified with innovative node fingerprint generation algorithms.
arXiv Detail & Related papers (2023-12-13T03:17:05Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Hybrid Graph Neural Networks for Few-Shot Learning [85.93495480949079]
Graph neural networks (GNNs) have been used to tackle the few-shot learning problem. Under the inductive setting, existing GNN based methods are less competitive. We propose a novel hybrid GNN model consisting of two GNNs, an instance GNN and a prototype GNN.
arXiv Detail & Related papers (2021-12-13T10:20:15Z)
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest. GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel. Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z)
Working Memory Connections for LSTM [51.742526187978726]
We show that Working Memory Connections constantly improve the performance of LSTMs on a variety of tasks. Numerical results suggest that the cell state contains useful information that is worth including in the gate structure.
arXiv Detail & Related papers (2021-08-31T18:01:30Z)
Gates Are Not What You Need in RNNs [2.6199029802346754]
We propose a new recurrent cell called Residual Recurrent Unit (RRU) which beats traditional cells and does not employ a single gate. It is based on the residual shortcut connection, linear transformations, ReLU, and normalization. Our experiments show that RRU outperforms the traditional gated units on most of these tasks.
arXiv Detail & Related papers (2021-08-01T19:20:34Z)
GateNet: Gating-Enhanced Deep Network for Click-Through Rate Prediction [3.201333208812837]
In recent years, many neural network based CTR models have been proposed and achieved success. We propose a novel model named GateNet which introduces either the feature embedding gate or the hidden gate to the embedding layer or hidden CTR models.
arXiv Detail & Related papers (2020-07-06T12:45:46Z)
Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs [5.672132510411465]
We study how the addition of gates influences the dynamics and trainability of GRUs and LSTMs. We show that the update gate in the GRU and the forget gate in the LSTM can lead to an accumulation of slow modes in the dynamics.
arXiv Detail & Related papers (2020-01-31T19:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.