A constrained recursion algorithm for batch normalization of
tree-sturctured LSTM
- URL: http://arxiv.org/abs/2008.09409v1
- Date: Fri, 21 Aug 2020 10:31:45 GMT
- Title: A constrained recursion algorithm for batch normalization of
tree-sturctured LSTM
- Authors: Ruo Ando, Yoshiyasu Takefuji
- Abstract summary: Tree-structured LSTM is promising way to consider long-distance interaction over hierarchies.
Proposal method is effective for hyper parameter tuning such as the number of batches.
- Score: 0.29008108937701327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tree-structured LSTM is promising way to consider long-distance interaction
over hierarchies. However, there have been few research efforts on the
hyperparameter tuning of the construction and traversal of tree-structured
LSTM. To name a few, hyperparamters such as the interval of state
initialization, the number of batches for normalization have been left
unexplored specifically in applying batch normalization for reducing training
cost and parallelization. In this paper, we propose a novel recursive algorithm
for traversing batch normalized tree-structured LSTM. In proposal method, we
impose the constraint on the recursion algorithm for the depth-first search of
binary tree representation of LSTM for which batch normalization is applied.
With our constrained recursion, we can control the hyperparameter in the
traversal of several tree-structured LSTMs which is generated in the process of
batch normalization. The tree traversal is divided into two steps. At first
stage, the width-first search over models is applied for discover the start
point of the latest tree-structured LSTM block. Then, the depth-first search is
run to traverse tree-structured LSTM. Proposed method enables us to explore the
optimized selection of hyperparameters of recursive neural network
implementation by changing the constraints of our recursion algorithm. In
experiment, we measure and plot the validation loss and computing time with
changing the length of internal of state initialization of tree-structured
LSTM. It has been turned out that proposal method is effective for
hyperparameter tuning such as the number of batches and length of interval of
state initialization of tree-structured LSTM.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Latent Logic Tree Extraction for Event Sequence Explanation from LLMs [19.90330712436838]
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences.
Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence.
In the online setting, our locally built, lightweight model will iteratively extract the most relevant rules from LLMs for each sequence using only a few iterations.
arXiv Detail & Related papers (2024-06-03T09:10:42Z) - Des-q: a quantum algorithm to provably speedup retraining of decision trees [2.7262923206583136]
We introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks.
We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets.
Our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
arXiv Detail & Related papers (2023-09-18T17:56:08Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - Trilevel Neural Architecture Search for Efficient Single Image
Super-Resolution [127.92235484598811]
This paper proposes a trilevel neural architecture search (NAS) method for efficient single image super-resolution (SR)
For modeling the discrete search space, we apply a new continuous relaxation on the discrete search spaces to build a hierarchical mixture of network-path, cell-operations, and kernel-width.
An efficient search algorithm is proposed to perform optimization in a hierarchical supernet manner.
arXiv Detail & Related papers (2021-01-17T12:19:49Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z) - A Tree Architecture of LSTM Networks for Sequential Regression with
Missing Data [0.0]
We introduce a novel tree architecture based on the Long Short-Term Memory (LSTM) networks.
In our architecture, we employ a variable number of LSTM networks, which use only the existing inputs in the sequence.
We achieve significant performance improvements with respect to the state-of-the-art methods for the well-known financial and real life datasets.
arXiv Detail & Related papers (2020-05-22T18:57:47Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.