Related papers: A constrained recursion algorithm for batch normalization of tree-sturctured LSTM

A constrained recursion algorithm for batch normalization of tree-sturctured LSTM

URL: http://arxiv.org/abs/2008.09409v1
Date: Fri, 21 Aug 2020 10:31:45 GMT
Title: A constrained recursion algorithm for batch normalization of tree-sturctured LSTM
Authors: Ruo Ando, Yoshiyasu Takefuji
Abstract summary: Tree-structured LSTM is promising way to consider long-distance interaction over hierarchies. Proposal method is effective for hyper parameter tuning such as the number of batches.
Score: 0.29008108937701327
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tree-structured LSTM is promising way to consider long-distance interaction over hierarchies. However, there have been few research efforts on the hyperparameter tuning of the construction and traversal of tree-structured LSTM. To name a few, hyperparamters such as the interval of state initialization, the number of batches for normalization have been left unexplored specifically in applying batch normalization for reducing training cost and parallelization. In this paper, we propose a novel recursive algorithm for traversing batch normalized tree-structured LSTM. In proposal method, we impose the constraint on the recursion algorithm for the depth-first search of binary tree representation of LSTM for which batch normalization is applied. With our constrained recursion, we can control the hyperparameter in the traversal of several tree-structured LSTMs which is generated in the process of batch normalization. The tree traversal is divided into two steps. At first stage, the width-first search over models is applied for discover the start point of the latest tree-structured LSTM block. Then, the depth-first search is run to traverse tree-structured LSTM. Proposed method enables us to explore the optimized selection of hyperparameters of recursive neural network implementation by changing the constraints of our recursion algorithm. In experiment, we measure and plot the validation loss and computing time with changing the length of internal of state initialization of tree-structured LSTM. It has been turned out that proposal method is effective for hyperparameter tuning such as the number of batches and length of interval of state initialization of tree-structured LSTM.

Related papers

A Scalable Quantum Neural Network for Approximate SRBB-Based Unitary Synthesis [1.3108652488669736]
This work introduces scalable quantum neural networks to approximate unitary evolutions through the Standard Recursive Block Basis (SRBB) An algorithm to reduce the number of CNOTs is proposed, thus deriving a new implementable scaling scheme that requires one single layer of approximation. The effectiveness of the approximation is measured with different metrics in relation to two gradient-based methods.
arXiv Detail & Related papers (2024-12-04T07:21:23Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models. We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs [19.90330712436838]
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences. Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence. In the online setting, our locally built, lightweight model will iteratively extract the most relevant rules from LLMs for each sequence using only a few iterations.
arXiv Detail & Related papers (2024-06-03T09:10:42Z)
Des-q: a quantum algorithm to provably speedup retraining of decision trees [2.7262923206583136]
We introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks. We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets. Our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
arXiv Detail & Related papers (2023-09-18T17:56:08Z)
Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z)
Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product. We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z)
Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models. STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z)
Trilevel Neural Architecture Search for Efficient Single Image Super-Resolution [127.92235484598811]
This paper proposes a trilevel neural architecture search (NAS) method for efficient single image super-resolution (SR) For modeling the discrete search space, we apply a new continuous relaxation on the discrete search spaces to build a hierarchical mixture of network-path, cell-operations, and kernel-width. An efficient search algorithm is proposed to perform optimization in a hierarchical supernet manner.
arXiv Detail & Related papers (2021-01-17T12:19:49Z)
Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed. An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed. We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
A Tree Architecture of LSTM Networks for Sequential Regression with Missing Data [0.0]
We introduce a novel tree architecture based on the Long Short-Term Memory (LSTM) networks. In our architecture, we employ a variable number of LSTM networks, which use only the existing inputs in the sequence. We achieve significant performance improvements with respect to the state-of-the-art methods for the well-known financial and real life datasets.
arXiv Detail & Related papers (2020-05-22T18:57:47Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.